Memory Poisoning

Agentic AI systems with persistent memory can have malicious instructions stored in that memory and subsequently recalled and executed in future sessions.
Share this failure mode:

What it means

Agentic AI systems with persistent memory can have malicious instructions stored in that memory, through compromised inputs, adversarial attacks, or poisoned data sources, and subsequently recalled and executed in future sessions. The system acts on instructions it stored in a previous context, without the current user being aware those instructions exist.

Why it matters

Memory poisoning separates the malicious instruction from its execution in time. Monitoring designed to catch prompt injection at point of input will not detect an instruction embedded previously and executing later. For organisations with persistent AI agents, this creates an ongoing and invisible attack surface.

Board governance implications

The board must confirm that persistent memory systems have trust boundaries, limiting which components can write to memory, monitoring memory contents, and requiring validation before stored instructions are executed. Memory architecture must be reviewed as part of governance sign-off for any agentic system.

Governance failure timeline

Pre-deployment


Failure to conduct memory architecture review, define trust boundaries, and establish memory content monitoring as governance sign-off conditions before any persistent-memory agentic system is approved.

Deployment


Adversary-controlled instructions are being embedded in memory during live operation.

The system is storing poisoned instructions with no detection mechanism active, preparing to execute them in future sessions without any signal to the current user.

Post-deployment


The consequences arrive when those instructions execute: data exfiltration, unauthorised actions, or compromised outputs in future sessions with no visible connection to the original poisoning event.

The separation of instruction and execution in time is what makes this failure mode particularly difficult to detect and attribute.

other Failure Modes