Risk Thresholds for Frontier AI: Paper review
Paper review: how to set risk thresholds in frontier AI development?
Did you know that the threshold of individual risk for many activities is around one in a million per year? This figure is often used as a benchmark when considering the acceptability of risks in various fields, including transportation, medicine, and even entertainment. Is it applicable to AI then?
Decisions in the AI domain are high-stakes. Releasing or deploying a potentially harmful model can have severe consequences. Of course, here is a catch - AI is a dual-use technology and can bring both incredibly significant benefits and devastating harms. How about clear thresholds to determine when additional safety measures are necessary? This is pretty tricky.
Frontier AI risks (for example, cyberattacks, biological weapons, loss of control over AI systems) arise from high-stakes decisions by AI companies - like whether to release a model or pause for safety enhancements.
Thresholds are helpful to find the point crossing which is too risky, and there are several types of them in the AI-related discussions and regulatory practices.
- Compute thresholds are a proxy for a model's capabilities and risks. This metric is used by US and EU regulations to identify high-risk AI model
- Capability thresholds focus on specific capabilities (chemical, bio, cyber, autonomy, for example). While not yet used by regulators, they're still a proxy for risk.
- Risk thresholds (combinations of likelihood and severity of harms) are the ideal metric but are challenging to measure (we simply do not have enough past data to base our predictions on, like we would do in other well-established industries).
The paper I'm summarising here is this one:
In short, it suggests that companies should:
- Define risk thresholds for decision-making (and this process deserves a separate post!)
- Use risk thresholds to set capability thresholds
- Primarily rely on capability thresholds for decisionsAt this stage, regulators use compute thresholds - but, exactly as the paper recommends, only as an initial filter to identify potentially risky models that reserve "special attention" (aka requirements related to reporting, risk mitigation, accident tracking, cybersecurity efforts, and so on).
Risk thresholds are the ultimate goal but for now we have to use proxies, and the paper discusses why and how - a must-read for anyone interested to improve their AI literacy, too.