Contextual Constitutional AI
This project won the "Technical research" prize on our AI Alignment (2024 Jun) course. The text below is an excerpt from the final project.
Summary
In this post, I motivate an extension of constitutional AI (CAI) and present one possible concrete execution of that strategy.
TL;DR: When generating AI feedback during the CAI process, principles from the constitution are randomized for each pair of red-teamed prompts and initial responses. A helpful-only model then critiques its initial responses and subsequently revises them. Instead of randomizing selecting principles, I propose we choose principles based on the context provided by each particular prompt/response pair. I call this contextual constitutional AI.
This is intended only as a preliminary insight as part of my AISF: Alignment course project. Due to limited time and funding, I have made certain decisions that have made my investigation into this approach easier.