AI Alignment (2024 Mar)

Ethical Alignments in AI: How Training Data Shapes Moral Foundations

By Scott Barlow (Published on July 4, 2024)

In the rapidly evolving field of artificial intelligence, understanding the ethical alignment of models is crucial. The Moral Foundations Theory (MFT) provides a robust framework to evaluate the moral underpinnings of AI models. This post delves into the technical evaluation of several prominent AI models using the Moral Foundations Questionnaire (MFQ), comparing these results to human cultural benchmarks, and discussing the implications for AI alignment and applications. Evaluating AI through ethical lenses ensures these technologies align with diverse human values, enhancing their applicability and acceptance.

AI models can exhibit significantly different moral values, largely influenced by their training data. For instance, the Jais-30b model, primarily trained on Arabic data, tends to emphasize loyalty, authority, and purity principles. In contrast, the Llama-3–70b model prioritizes fairness and harm reduction. This variation highlights how training data shapes the ethical alignments of AI models, making it crucial to consider these differences when selecting models for specific applications.

Read the full piece here.