AI Alignment (2024 Mar)

A Research Agenda for Psychology and AI

By Carter Allen (Published on June 30, 2024)

This project was a runner-up to the 'AI strategy' prize on our AI Alignment (March 2024) course

I think very few people have thought rigorously about how psychology research could inform the trajectory of AI or humanity’s response to it. Despite this, there seem to be many important contributions psychology could make to AI safety. For instance, a few broad paths-to-impact that psychology research might have are:

  1. Helping people anticipate the societal response to possible developments in AI. In which areas is public opinion likely to be the bottleneck to greater AI safety?
  2. Improving forecasting/prediction techniques more broadly and applying this to forecasts of AI trajectories (the Forecasting Research Institute’s Existential Persuasion Tournament is a good example).
  3. Describing human values and traits more rigorously to inform AI alignment, or to inform decisions about who to put behind the wheel in an AI takeoff scenario.
  4. Doing cognitive/behavioral science on AI models. For instance, developing diagnostic tools that can be used to assess how susceptible an AI decision-maker is to various biases.
  5. Modeling various risks related to institutional stability. For instance, arms races, risks posed by malevolent actors, various parties’ incentives in AI development, and decision-making within/across top AI companies.

I spent several weeks thinking about specific project ideas in these topic areas as part of my final project for BlueDot Impact’s AI Safety Fundamentals course. I’m sharing my ideas here because a) there are probably large topic areas I’m missing, and I’d like for people to point them out to me, b) I’m starting my PhD in a few months, and I want to do some of these ideas, but I haven't thought much about which of them are more/less valuable, and c) I would love for anyone else to adopt any of the ideas here or reach out to me about collaborating! I also hope to write a future version of this post that incorporates more existing research (I haven’t thoroughly checked which of these project ideas have already been done).

Read the full piece here.

We use analytics cookies to improve our website and measure ad performance. Cookie Policy.