AI alignment project evaluation criteria

By Adam Jones (Published on August 11, 2024)

Below are the criteria that we'll use to evaluate projects on our AI Alignment Course. Project submissions will be evaluated by a panel of expert judges.

Clarity of communication and presentation

It's unclear what the key takeaways from the project are.
The project has a clear question, hypothesis or measurable goal OR is presented clearly (ie: easy for the reviewer to pick up on the key takeaways).
The project has a clear question, hypothesis or measurable goal AND is presented clearly (ie: easy for the reviewer to pick up on the key takeaways).
The above, plus it's a fairly easy and accessible read throughout. You expect the target audience could easily understand the key takeaways and conclusions.
The above, plus there's something special about the presentation. For example, it is particularly interactive/engaging/well-designed.

Relevance to the safety of advanced AI systems

By this, we mean the project relates to decreasing the expected harm (i.e. has a focus on improving things) from highly capable and broad AI systems (as opposed to less capable or narrow systems). The course focuses on catastrophic risks such as authoritarian lock-in, war, and loss of control. We expect these to be more neglected and less likely to self-correct than other AI risks from advanced AI systems, and therefore be more relevant to tackle.

The rubric is:

The project is mostly unrelated to advanced AI system safety. For example, it might focus on other digital harms like privacy breaches or online fraud, without specific relevance to advanced AI systems.
The work has some relevance to advanced AI system safety, but might be fairly generic or solely focus on current systems. For example, an overview of safety principles for current AI systems without linking this to risks related to advanced AI systems.
The work is clearly related to core advanced AI system safety concerns: addressing the challenges involved in ensuring highly capable, advanced AI systems do not cause catastrophic harms.
The above, plus the work explicitly explains its relevance to advanced AI system safety. After reading, you'd feel confident explaining to someone else how this project could contribute to reducing catastrophic risks from highly capable, advanced AI systems.
The above, plus it explains how it fills a gap in existing work or has a strong theory of change within advanced AI system safety. This can include projects that ended up with negative results (e.g. "I tried to implement this safety approach for advanced AI systems, and here's why it didn't work") or novel proposals for advanced AI system safety initiatives that haven't previously been suggested.

Quality of research

The project appears rushed or poorly executed, lacking depth or rigour expected of a serious 20-hour effort.[1]
The project demonstrates a reasonable level of effort and understanding of AI safety concepts, but lacks novelty or depth of insight.
The project presents a novel approach or insight in AI safety. It contributes new information or perspective to the field, even if modest in scope.
The above, plus the project is impressive and captures your attention positively. It might offer guidance on next steps (e.g. explicit concrete steps in a further work section).
The project blows you away. If you knew people working in this area, you'd send this on to them for them to read it. Or if you were working in this area, based on this project you'd be excited to work with this person.

Footnotes

4 hours / week × 5 weeks = 20 hours

Our course page explains the time commitment is 5 hours per week. 1 hour covers the resources and the live session, if you average across the weeks.

There are 5 weeks from session 8 (the beginning of the project phase) to the submission deadline.