Prompt Injection and Adversarial Attacks

Attackers manipulate AI system inputs to override instructions, bypass controls, extract confidential information or cause unauthorised actions.
Share this failure mode:

What it means

Attackers manipulate AI system inputs, through direct user prompts or indirect injection via external content the system processes, to override instructions, bypass controls, extract confidential information, or cause unauthorised actions.

Why it matters

Prompt injection exploits the fundamental design of AI systems, not a patchable flaw. For agentic systems with access to sensitive data and the ability to take actions, a successful injection can exfiltrate data, execute unauthorised transactions, or impersonate organisational communications. The internal governance failure is the absence of controls that limit the blast radius of a successful attack.

Board governance implications

The board must confirm that AI systems have least-privilege access:

  • Sensitive systems require human approval for high-impact actions,
  • Input and output monitoring is in place, and
  • Incident response planning accounts for prompt injection scenarios

The absence of these controls is the internal governance failure.

Governance failure timeline

Pre-deployment


Failure to implement least-privilege access controls, input and output monitoring, and incident response planning before deploying any AI system accessible to external inputs.

Absence of prompt injection as a tested attack scenario in pre-deployment security review.

Deployment


Unauthorised data exfiltration, compromised AI-assisted decisions, and potential system compromise are live from point of exploitation.

The reputational and regulatory exposure follows immediately.

Post-deployment


The attack surface persists if the underlying vulnerability is not addressed.

Data exfiltration risk is ongoing.

Reputational and regulatory exposure accumulates as incidents develop.

Forensic investigation establishes what controls were absent at deployment, and the absence of least-privilege access controls, input and output monitoring, and incident response planning becomes the documented governance failure.

other Failure Modes