AI Alignment (2024 Mar)

Replicating Toy Models of Universality

By Nathan Reed (Published on July 6, 2024)

For my March 2024 AI Safety Fundamentals project, I replicated Bilal Chughtai et al’s paper A Toy Model of Universality. Due to time and skill constraints, I was only able to test the logit attributions (Section 5.1) and the embeddings and unembeddings (Section 5.2) using MLP models trained on the group C_113, but this was sufficient to confirm the overall thesis of the paper.

Read the full piece here.

We use analytics cookies to improve our website and measure ad performance. Cookie Policy.