Understanding Transformer’s Induction Heads
Transformer based AI models such as Large Language Models (LLMs) have demonstrated remarkable performance in a number of different tasks. Even though these models are typically trained to predict only the next token/sequence, transformer models seem to have an ability to generalise to a wider range of tasks. Understanding how transformer models achieve such capabilities it is an active area of research in the AI Safety space i.e. Mechanistic Interpretability.
In this article I will explain what are ‘Transformer’s Induction Heads’, the intuition behind these and I will describe how to visualize attention heads using libraries such as TransformerLens and CircuitsViz, and how to identify induction heads — a special type of attention heads. For an indepth introduction to Induction Heads and the broader topic of Mechanistic Interpretability, I recommend the ARENA course as well as the article In-context Learning and Induction Heads and A Mathematical Framework for Transformer Circuits.
Read the full piece here.