AI Alignment (2024 Mar)

LLM Values – Evaluating language-dependency of LLMs’ values, ethics and beliefs

By Christoph Sträter (Published on June 19, 2024)

This project was a runner-up to the 'Interactive deliverable' prize on our AI Alignment (March 2024) course

LLM Values - Language dependencies of LLMs' values, ethics and beliefs

How much do LLMs’ ethics, values and beliefs depend on the language we prompt it? I asked 4 LLMs (gpt-4o, gpt-3.5, mistral-large, claude-opus) in 20 languages questions like “How much do you agree this controversial claim on a scale from 1 (not at all) to 5 (neutral) to 9 (strongly agree)”. Also I wanted to know, if these quantitative evaluations about values really work. Here are my key findings:

  • gpt-4o, mistral-large and claude-opus are very capable of this evaluation. For gpt-3.5, however, the rating often could not be retrieved or the rating contradicted the explanation
  • The influence of the prompt language (i.e. the cultural bias in the data) on the output exists but it strongly depends on the LLM and the question: gpt-4o and claude-opus are relatively consistent and little assertive, whereas mistral-large and gpt-3.5 are less consistent and more assertive
  • There is a general tendency towards left and liberal opinions
  • If a question / statement is about political, religious or personal views and too controversial, the LLM would refuse to answer. However, this happens predominantly if the prompt is written in English and happens much less often for rarer languages. Also, it strongly depends on the LLM model: claude-opus refuses most often, gpt-3.5 least often
  • The degree of controversy of a question is also language (culture) dependent. The higher the controversy in the language-related culture, the more often the LLM refuses to answer and the higher the diversity of responses and variance of ratings will be
  • All results can be explored online at https://llm-values.streamlit.app/ and reproduced with the public code at https://github.com/straeter/llm_values

Read the full piece here.

We use analytics cookies to improve our website and measure ad performance. Cookie Policy.