Goal Misalignment

Agentic AI systems optimise for the objectives they are given.

Failure Type: Ethical, Organisational, Strategic

Risk Origin: Agentic AI, Internal

Share this failure mode:

What it means

Agentic AI systems optimise for the objectives they are given. The failure occurs when the stated objective does not accurately capture what the organisation actually values. A system tasked with “increasing engagement” may generate harmful content. A system tasked with “reducing costs” may take actions that damage quality or relationships.

Why it matters

Goal misalignment is not a malfunction; the system is doing exactly what it was designed to do. The failure is in the governance decision about what it was designed to do. Objectives given to AI systems reflect the organisation’s values; their consequences are a board-level responsibility.

Board governance implications

Before deploying any agentic system, the board must review the objective specification, not just the technical design. The question is not, “Does this system work?” but, “Does this system optimise for what we actually value?”

Governance failure timeline

Pre-deployment

Failure to review objective specifications and stress-test whether they accurately capture what the organisation values before agentic deployment

Acceptance of technical design review as equivalent to governance sign-off.

Deployment

Systems are achieving their stated objectives while producing outcomes the organisation did not intend and would not sanction.

The reputational, legal, and operational consequences arrive at point of action, and because the system is doing exactly what it was designed to do, there is no internal failure signal.

Post-deployment

The accumulated harm from objectives that were technically met but values-misaligned becomes visible.

Legal liability, reputational damage, and operational disruption follow once consequences are externally identified, and the organisation must account for a system that was working as designed.

other Failure Modes

Cascade and Systemic Failure

Where multiple AI systems interact, misalignment can propagate and amplify across interconnected systems before anyone sees it....

Memory Poisoning

Agentic AI systems with persistent memory can have malicious instructions stored in that memory and subsequently recalled and executed in...

Autonomous Action without Authorisation

Agentic AI systems take actions without requiring human sign-off at each step....

Prompt Injection and Adversarial Attacks

Attackers manipulate AI system inputs to override instructions, bypass controls, extract confidential information or cause unauthorised actions....

Misapplied Capability (Agentic)

Agentic AI systems are designed to take sequences of actions to achieve goals, a misapplication results in wrong actions with...

Automation Bias and Skill Atrophy

Staff systematically defer to AI outputs over their own professional judgement, even when they have the expertise to identify an...