AI Debate Stability: Addressing Self-Defeating Responses
By Annie Sorkin (Published on July 5, 2024)
- Transferring debate to an abstract algebra MMLU dataset is not trivial.
- When GPT-3.5 is used as a judge, the outcomes may be sensitive to exact prompt phrasing.
- GPT-3.5 may perform worse in judging the debate than answering the question directly.
- We proposed a universal prompting approach that avoids most of the self-defeating behavior.
Read the full piece here.