A Monte Carlo Simulation for estimating the risk of loss of control to AI
This project was completed on our AI Alignment (2024 Jun) course. The text below is an excerpt from the final project
Abstract
Despite recognition of the potential of longer term catastrophic risks from advanced AI models from Frontier AI labs and policy makers, standard methodologies for managing longer term catastrophic risks from AI models remain elusive.
This experiment explores how Monte Carlo simulations, a risk modelling technique widely used in other industries for risk assessment, can be used to help reduce uncertainty around the longer-term risk of losing control to AI.
Following the dominant approach of defending model safety based on limited model capabilities and protective measures (also called controls), the Monte Carlo simulation quantifies and compares these factors, to arrive at a range of risk likelihoods.
The work contributes:
-
A Monte Carlo model to combine information on qualitative assessments of dangerous capabilities and protective measures to provide quantitive risk likelihood predictions
-
A potential approach for augmenting Responsible Scaling Policies with quantifiable ranges to demonstrate how claims over future safety could be tested and proven
Run the Monte Carlo model on Kaggle.
Code also available here: https://github.com/francescini/monte-carlo/
Monte Carlo model outputs provide quantitative data points for best, worst and most likely risk scenarios. This could be used to help simulate testing against future capability levels in Responsible Scaling Policies (based on diagram in METR, 2023)
Full project
You can view the full project here.