Dangerous or Harmful Content Generation

AI systems can generate content that enables or facilitates physical harm.
Share this failure mode:

What it means

AI systems can generate content that enables or facilitates physical harm, e.g. detailed instructions for violence, dangerous activities, or CBRN (chemical, biological, radiological, nuclear) capabilities. This failure occurs when tools are deployed without adequate safeguards or when external actors deliberately elicit harmful outputs.

Why it matters

Where an organisation’s AI tools generate harmful content, through deliberate elicitation or inadequate configuration, the reputational, legal, and regulatory consequences fall on the deploying organisation, not the model provider.

Board governance implications

Before deploying any AI tool accessible to the public, clients, or unvetted users, the board must confirm that safeguards against harmful content generation have been tested and documented. Default settings are not a governance position.

Governance failure timeline

Pre-deployment


Failure to test and document safeguards against harmful content generation before making any AI tool accessible to the public, clients, or unvetted users.

Acceptance of default settings as a governance position.

Deployment


The reputational consequences are immediate and severe.

Regulatory investigation begins.

Where AI-generated content enables physical harm, potential criminal liability follows.

The media crisis arrives at point of discovery, not point of generation, and the organisation is responding under pressure without preparation.

Post-deployment


The damage is sustained.

Civil litigation from those harmed runs alongside ongoing regulatory investigation.

The system must be withdrawn, remediated, or restricted.

Recovery depends on whether a governance record exists to demonstrate that controls were in place and were circumvented, rather than absent.

other Failure Modes