What it means
AI systems fail when encountering inputs outside their training distribution, e.g. unusual data, edge cases, unexpected user behaviours, or novel scenarios. Unlike model drift, this is not temporal degradation; the system was never reliable at its boundaries. It performs well in standard conditions and fails unpredictably at the margins.
Why it matters
AI systems deployed in high-stakes contexts are most likely to encounter unusual inputs precisely when reliability matters most. A system that fails at the boundary is not a system the board can rely upon in the scenarios that require it.
Board governance implications
Pre-deployment testing must include edge case and adversarial scenario testing, not just standard performance benchmarking. The board’s sign-off question: have we tested this system under the conditions most likely to produce failure, and are results documented?
Governance failure timeline
Pre-deployment
Failure to conduct edge case and adversarial scenario testing before deployment sign-off; acceptance of standard performance benchmarking as sufficient governance for high-stakes use cases.
Deployment
The system fails in precisely the high-stakes or novel scenarios where reliability is most required.
Professional liability and reputational exposure arrive together at point of use.
Post-deployment
Failures recur as novel and unusual inputs continue to arise in live operation.
Professional liability claims accumulate.
Reputational exposure compounds with each incident.
Regulatory scrutiny focuses on the testing methodology that approved the system, specifically: whether boundary testing was conducted or whether standard performance benchmarking was accepted in its place.