Analogous Risks and Lessons for AI Model Red Teaming from Gain-of-Function Virology Research
This project is one of three winners of the "Outstanding Governance Project" prize for our AI Governance (August 2024) course. The text below is an excerpt from the final project.
Recently, while studying AI governance, policy, and regulation in BlueDot Impact’s AI Governance course I became absorbed in reading about the practice of AI model red teaming (AIMRT), an adversarial practice to probe weaknesses and discover unknown capabilities in advanced AI models. As I come from a bioethics and global health policy background, I immediately perceived parallels with gain-of-function research (GoFR) in the field of virology — a field notorious for its high-risk experiments on pathogens. Both domains grapple with balancing innovation and safety. As I considered these practices side-by-side I began to draw more analogies between them. One dissimilarity struck me; it occurred to me that GoFR is considered rather dangerous and controversial and as a result is highly regulated. Meanwhile, to my knowledge, AIMRT has received nothing but positive regard and encouragement even in policy and regulation circles. I found the apparent discrepancy in perception and treatment of these seemingly deeply analogous practices intriguing. I wondered whether the analogies I perceived were merely superficial, or if there were lessons that could be learned from GoFR for safer and better-regulated practice in the sphere of AIMRT. I decided to take a closer look.
To view the full project submission, click here.