Threat Hunting Across Domains: From Cyber Security to AI Alignment
A key component of AI alignment research is understanding threats posed by AI models. Trying to identify, and hunt for, threats is undeniably a difficult process. However, while the field of AI alignment is in its infancy, threat hunting more broadly is substantially more developed. Threat hunting has been carried out for decades in the context of cyber security.
In this post, I explore how the threat hunting has evolved in cyber security over the past years, summarise what hunting methodologies have been created, and consider if any of this prior work can feasibly be applied to the domain of AI alignment (spoiler alert: it can).
Read the full piece here.