AI Alignment (2024 Mar)

Trying to Automate Detection of Translation Heads

By Erik Nordby (Published on July 5, 2024)

For my project, I decided to work on one of the 200 Open Problems in Mechanistic Interpretability. Specifically, I tried automating ways to find translation heads

Read the full piece here.

We use analytics cookies to improve our website and measure ad performance. Cookie Policy.