Root Cause Analysis of AI Safety Incidents
This project was a runner-up to the 'AI strategy' prize on our AI Alignment (March 2024) course
The application of Root Cause Analysis tools that are widely used in other areas of engineering could provide insights into the causes of AI safety incidents that would reduce the likelihood of recurrence and support the prioritisation of risk mitigation and harm reduction measures.
This post proposes and demonstrates an approach for using language models to process incident reports and make inferences as to potential causes.
The approach, if used with due consideration to its capabilities and shortcomings, could offer a scalable methodology to aggregate causality data from historic incidents and potentially from modelled future scenarios, in order to contribute to a reduction in the numbers of future safety incidents and the severity of harm caused by them.
Read the full piece here.