AI Alignment (2024 Jun)

LLM Agent Honeypot: Monitoring AI Hacking Agents in the Wild

By Reworr R (Published on October 17, 2024)

This project was runner-up for the "Technical Research" prize on our AI Alignment (June 2024) course. The text below is an excerpt from the final project.

Project Overview

The LLM-Hack Agent Honeypot is a project designed to monitor, capture, and analyze autonomous AI Hacking Agents in the real world.

How It Works:

Simulation: We deploy a simulated "vulnerable" service to attract potential threats.
Catching Mechanisms: This service incorporates specific counter-techniques designed to detect and capture AI-Hacking Agents.
Monitoring: We monitor and log all interactions, waiting for potential attacks from LLM-powered agents.
Capture and Analysis: When an AI agent engages with our system, we capture the attempt and their system prompt details.

Why?

Our objectives aim to improve awareness of AI Hacking Agents and their current state of risks by understanding their real-world usage and studying their algorithms and behavior in the wild.

Full project

You can view the full project here.