I read every major AI lab’s safety plan so you don’t have to
This project was the runner-up the "Best Governance Explainer" prize for our AI Governance (August 2024) course. The text below is an excerpt from the final project.
A handful of tech companies are competing to build advanced, general-purpose AI systems that radically outsmart all of humanity. Each acknowledges that this will be a highly – perhaps existentially – dangerous undertaking. How do they plan to mitigate these risks?
Three industry leaders have released safety frameworks outlining how they intend to avoid catastrophic outcomes. They are OpenAI’s Preparedness Framework, Anthropic’s Responsible Scaling Policy and Google DeepMind’s Frontier Safety Framework.
Despite having been an avid follower of AI safety issues for almost two years now, and having heard plenty about these safety frameworks and how promising (or disappointing) others believe them to be, I had never actually read them in full. I decided to do that – and to create a simple summary that might be useful for others.
I tried to write this assuming no prior knowledge. It is aimed at a reader who has heard that AI companies are doing something dangerous, and would like to know how they plan to address that. In the first section, I give a high-level summary of what each framework actually says. In the second, I offer some of my own opinions.
Note I haven’t covered every aspect of the three frameworks here. I’ve focused on risk thresholds, capability evaluations and mitigations. There are some other sections, which mainly cover each lab’s governance and transparency policies. I also want to throw in the obvious disclaimer that I have not been comprehensive here and have probably missed some nuances despite my best efforts to capture all the important bits!
To view the full project submission, click here.