AI Alignment (2024 Mar)

Exploring MARL Safety in meltingpot

By Gema Parreño, Peter Francis, Cam Tice, Chris Pond, Yohan Mathew, Tomasz Steifer, Marina Levay (Published on July 4, 2024)

The project is based on a Multi Agent Reinforcement Learning simulation that explores the tragedy of the commons dilemma. We explored the following research questions:

In scenarios where the tragedy of the commons can be assumed, what elicits cooperation in multiagent systems?
What kind of relevant AI Safety insights can we extract from exploring cooperative multiagent systems?

To discuss this question, we conducted several experiments that:

Introduced evaluations. Measuring agent generalization and capabilities in various setups , comparing focal population performance across scenarios, and focal population versus background population.
Introduced changes to the game environment and dynamics. By changing resource respawn rates we examined the risks of overharvesting and the agents’ ability to maintain equilibrium in scarce and abundant environments. By modifying the environment we tested the resilience of AI systems to environmental changes, introducing concepts like private property.
Disabled the ability of agents to punish each other. We examined the risks associated with a lack of punitive measures and removing social norms enforcement. We called this agent no_zap , and created a specific substrate called commons_harvest_disabled_punishment
Modified the reward signal during training. From an AI Safety standpoint, this experiment examines the potential for misaligned incentives leading to selfish behavior. We called these agents Farmer, and created a specific substrate called commons_harvest_farmer , together with a specific farmer lua component.

Had discussions about risks in Multiagent systems and highlighted one specific situation within our exercise in melting-pot and the tragedy of the commons dilemma.

Read the full piece here.