Prompt Injection and Adversarial Attacks

What it means

Attackers manipulate AI system inputs, through direct user prompts or indirect injection via external content the system processes, to override instructions, bypass controls, extract confidential information, or cause unauthorised actions.

Why it matters

Prompt injection exploits the fundamental design of AI systems, not a patchable flaw. For agentic systems with access to sensitive data and the ability to take actions, a successful injection can exfiltrate data, execute unauthorised transactions, or impersonate organisational communications. The internal governance failure is the absence of controls that limit the blast radius of a successful attack.

Board governance implications

The board must confirm that AI systems have least-privilege access:

Sensitive systems require human approval for high-impact actions,
Input and output monitoring is in place, and
Incident response planning accounts for prompt injection scenarios

The absence of these controls is the internal governance failure.

Governance failure timeline

Pre-deployment

Failure to implement least-privilege access controls, input and output monitoring, and incident response planning before deploying any AI system accessible to external inputs.

Absence of prompt injection as a tested attack scenario in pre-deployment security review.

Deployment

Unauthorised data exfiltration, compromised AI-assisted decisions, and potential system compromise are live from point of exploitation.

The reputational and regulatory exposure follows immediately.

Post-deployment

The attack surface persists if the underlying vulnerability is not addressed.

Data exfiltration risk is ongoing.

Reputational and regulatory exposure accumulates as incidents develop.

Forensic investigation establishes what controls were absent at deployment, and the absence of least-privilege access controls, input and output monitoring, and incident response planning becomes the documented governance failure.

Prompt Injection and Adversarial Attacks

What it means

Why it matters

Board governance implications

Governance failure timeline

Pre-deployment

Deployment

Post-deployment

other Failure Modes

Cascade and Systemic Failure

Memory Poisoning

Autonomous Action without Authorisation

Goal Misalignment

Misapplied Capability (Agentic)

Automation Bias and Skill Atrophy