AI Alignment (2024 Jun)

A Monte Carlo Simulation for estimating the risk of loss of control to AI

By Francesca Gomez (Published on October 17, 2024)

This project was completed on our AI Alignment (2024 Jun) course. The text below is an excerpt from the final project

Abstract

Despite recognition of the potential of longer term catastrophic risks from advanced AI models from Frontier AI labs and policy makers, standard methodologies for managing longer term catastrophic risks from AI models remain elusive.

This experiment explores how Monte Carlo simulations, a risk modelling technique widely used in other industries for risk assessment, can be used to help reduce uncertainty around the longer-term risk of losing control to AI.

Following the dominant approach of defending model safety based on limited model capabilities and protective measures (also called controls), the Monte Carlo simulation quantifies and compares these factors, to arrive at a range of risk likelihoods.

The work contributes:

A Monte Carlo model to combine information on qualitative assessments of dangerous capabilities and protective measures to provide quantitive risk likelihood predictions
A potential approach for augmenting Responsible Scaling Policies with quantifiable ranges to demonstrate how claims over future safety could be tested and proven

Run the Monte Carlo model on Kaggle.

Code also available here: https://github.com/francescini/monte-carlo/

Monte Carlo model outputs provide quantitative data points for best, worst and most likely risk scenarios. This could be used to help simulate testing against future capability levels in Responsible Scaling Policies (based on diagram in METR, 2023)

Full project

You can view the full project here.