Obscene and Abusive Content Generation

AI systems can be manipulated to generate sexually explicit, discriminatory, abusive or harassing content.
Share this failure mode:

What it means

AI systems can be manipulated, through prompt injection, jailbreaking, or inadequate configuration, to generate sexually explicit, discriminatory, abusive, or harassing content. This can occur through external attack or internal misuse by staff.

Why it matters

Content of this nature generated by an organisation’s AI tools creates immediate reputational, legal, and HR exposure. In client-facing or public-facing contexts, a single incident can cause irreparable reputational damage.

Board governance implications

Acceptable use policy must explicitly prohibit generation of this content category. Safeguards must be tested before deployment, not assumed. Incident response must include a specific protocol for this failure type.

Governance failure timeline

Pre-deployment


Failure to test safeguards against abusive and explicit content generation before deployment.

Absence of acceptable use policy provisions explicitly prohibiting this content category before tools are made accessible.

Deployment


HR liability, reputational exposure, potential regulatory breach, and media crisis arrive at point of discovery.

In client-facing or public-facing contexts, a single incident causes damage that is disproportionate to the operational significance of the failure.

Post-deployment


The exposure is to repeat incidents, sustained HR and reputational liability, and regulatory scrutiny of whether safeguard adequacy was assessed before deployment.

Where incidents recur, the reputational damage compounds and the organisation’s position, that this was an isolated failure rather than a systemic one, becomes harder to maintain.

other Failure Modes