AI Alignment (2024 Mar)

Exploring the Use of Constitutional AI to Reduce Sycophancy in LLMs

By Aleksandr Eliseev (Published on July 4, 2024)

In the scope of this research, we’ve attempted to fine-tune the 4-bit quantized Mistral 7B model using the Constitutional AI technique (Bai 2022) with a constitution aimed at reducing Sycophancy. An approach similar to synthetically generated data (Wei 2024) has been used as the training data. We found a constitution that reduces sycophancy by ~26.5%, but the sycophancy of the fine-tuned models has increased after the fine-tuning.

Read the full piece here.