Dangerous or Harmful Content Generation

What it means

AI systems can generate content that enables or facilitates physical harm, e.g. detailed instructions for violence, dangerous activities, or CBRN (chemical, biological, radiological, nuclear) capabilities. This failure occurs when tools are deployed without adequate safeguards or when external actors deliberately elicit harmful outputs.

Why it matters

Where an organisation’s AI tools generate harmful content, through deliberate elicitation or inadequate configuration, the reputational, legal, and regulatory consequences fall on the deploying organisation, not the model provider.

Board governance implications

Before deploying any AI tool accessible to the public, clients, or unvetted users, the board must confirm that safeguards against harmful content generation have been tested and documented. Default settings are not a governance position.

Governance failure timeline

Pre-deployment

Failure to test and document safeguards against harmful content generation before making any AI tool accessible to the public, clients, or unvetted users.

Acceptance of default settings as a governance position.

Deployment

The reputational consequences are immediate and severe.

Regulatory investigation begins.

Where AI-generated content enables physical harm, potential criminal liability follows.

The media crisis arrives at point of discovery, not point of generation, and the organisation is responding under pressure without preparation.

Post-deployment

The damage is sustained.

Civil litigation from those harmed runs alongside ongoing regulatory investigation.

The system must be withdrawn, remediated, or restricted.

Recovery depends on whether a governance record exists to demonstrate that controls were in place and were circumvented, rather than absent.

Dangerous or Harmful Content Generation

What it means

Why it matters

Board governance implications

Governance failure timeline

Pre-deployment

Deployment

Post-deployment

other Failure Modes

Cascade and Systemic Failure

Memory Poisoning

Autonomous Action without Authorisation

Prompt Injection and Adversarial Attacks

Goal Misalignment

Misapplied Capability (Agentic)