AI Alignment 201 Course
AGI Safety Fundamentals
This curriculum is intended as a follow-up to the curriculum in the Alignment Course (which is the ‘101’ to this 201 course). We strongly recommend you complete that curriculum first.
Key points
This curriculum aims to give participants enough knowledge about alignment to understand the frontier of current research discussions. It assumes that participants have read through the Alignment Fundamentals curriculum, taken a course on deep learning, and taken a course on reinforcement learning (or have an equivalent level of knowledge).
Although these are the basic prerequisites, we expect that most people who intend to work on alignment should only read through the full curriculum after they have significantly more ML experience than listed above, since upskilling via their own ML engineering or research projects should generally be a higher priority for early-career alignment researchers.
When reading this curriculum, it’s worth remembering that the field of alignment aims to shape the goals of systems that don’t yet exist; and so alignment research is often more speculative than research in other fields. You shouldn’t assume that there’s a consensus about the usefulness of any given research direction; instead, it’s often worth developing your own views about whether techniques discussed in this curriculum might plausibly scale up to help align AGI.
The curriculum was compiled, and is maintained, by Richard Ngo; this version was last updated October 2022. For now, it’s primarily intended to be read independently; once we’ve run a small pilot program, we’ll likely extend it to a discussion-based course.
Course Details
The course consists of 7 weeks of readings, then two weeks focused on developing a literature review and/or research proposal.
As in the Alignment Fundamentals course, participants are divided into groups of 4-6 people, matched based on their prior knowledge about ML. Each week, each group and their discussion facilitator will meet for 1.5 hours to discuss the readings and exercises. Weeks 6 and 7 branch into three parallel tracks: participants can choose whether to focus on Eliciting Latent Knowledge, Agent Foundations or Science of Deep Learning.
The main focus each week will be on the core readings and one exercise of your choice out of the exercises listed, for which you should allocate around 2 hours preparation time. Approximate times taken to read each piece in depth are listed next to them.
Note that in some cases only a small section of the linked reading is assigned. In several cases, blog posts about machine learning papers are listed instead of the papers themselves; you’re only expected to read the blog posts, but for those with strong ML backgrounds reading the paper versions might be worthwhile.
In some cases, specific readings are given as assumed background for the week; participants who haven’t already read them should do so in addition to core readings. (These are typically readings from the Alignment Fundamentals course, but sometimes readings from newer versions of it which previous participants may not have seen.) Participants who have already read some of the core readings, or want to learn more about the topic, should try the further readings. However, none of them are compulsory.
Syllabus
Audio versions of some core readings are available on Apple Podcasts, Google Podcasts, Spotify and this RSS feed.