Moral Machines - From Value Alignment to Embodied Virtue
Class: PHIL-282
Author: Wendell Wallach and Shannon Vallor
Title: Moral Machines: From Value Alignment to Embodied Virtue
Introduction: Engineering Moral Machines
- Engineering Challenge: Creating AI with sensitivity to human values, norms, and laws has moved from theory to a real engineering problem.
- Value Alignment Approach:
- Popular with AI researchers, especially those concerned with superintelligence safety.
- Core Idea: Machines can learn human values by observing human behavior.
- Method: Uses computationally friendly concepts like "utility functions" and "agent preferences" instead of traditional ethical language (justice, duty, virtue).
- Philosophical Problems with Value Alignment:
- Observing human behavior shows what people do prefer (descriptive), not what they ought to prefer (normative). This is a classic "is/ought" problem.
- It conflates moral concepts with nonmoral facts, assuming that appropriate behavior can be reverse-engineered from data.
- The Authors' Argument:
- To be truly safe and beneficial, advanced AI must embody something like virtue or moral character, not just align with observed preferences.
- Virtue embodiment is a better long-term goal for AI safety than value alignment.
13.1 Moral Machines and Value Alignment
- Value Alignment's Goal: A strategy proposed by AI safety researchers to ensure AI choices align with human values and interests.
- Focus: Initially focused on preventing existential risks from a future Artificial General Intelligence (AGI) or superintelligence.
- Machine Ethics Field: An older, interdisciplinary field (philosophy, computer science, etc.) that has long considered these challenges.
- Machine ethicists tend to focus more on near-term autonomous systems in common situations, while also considering future complexities.
13.2 Core Concepts in Machine Ethics
- Implicit vs. Explicit Moral Agents:
- Implicit agents: Their behavior is constrained by designers so they cannot perform forbidden acts.
- Explicit agents: Can make their own moral decisions when faced with unforeseen situations or conflicting values.
- Operational vs. Functional Morality:
- Operational Morality: Systems designed for "bounded moral contexts" where engineers can anticipate the ethical challenges and program appropriate responses in advance.
- Functional Morality: For more complex contexts, systems need subroutines to make explicit moral decisions.
- Future Challenges: As AI autonomy increases, it will need to recognize context, prioritize conflicting values, and perhaps use consequence analysis to make "good enough," if not perfect, decisions.
13.3 Values, Norms, Principles, and Procedures
- What is Utilitarianism? An ethical theory where the best action is the one that produces the best outcome, maximizing a single goal like aggregate welfare or happiness.
- Why is it Appealing to Engineers?
- It appears to be a straightforward calculation: subtract undesirable consequences from desirable ones for each option and choose the highest value.
- It resembles the "utility functions" AI engineers are already familiar with, which can handle complex calculations with many variables.
- Authors' View on its Problems:
- Lack of Information: It's often impossible to know all the consequences of an action or their probabilities, especially secondary effects or unforeseen "Black Swan" events.
- Definitional Issues: What exactly should be maximized? "Happiness" and "welfare" are hard to define and measure empirically.
- Beyond Machine Capabilities: In practice, human utilitarian decisions rely on experience, intuition, imagination, and planning—cognitive abilities that current AI systems lack.
13.4 Top-Down, Bottom-Up, and Hybrid Approaches to Moral Machines
- Top-Down Approach:
- Starts with a pre-specified ethical theory (e.g., The Ten Commandments, Kant's categorical imperative, Asimov's Laws) and designs algorithms to implement it.
- Problem: Can be rigid, struggle to resolve conflicting duties, and have difficulty tracking the ethically important features of a complex environment (the "framing problem").
- Bottom-Up Approach:
- Systems learn what is acceptable through experience, much like a child does. Value alignment is a bottom-up approach.
- Uses methods like machine learning and simulated evolution.
- Problem: Current machine learning is not robust enough to simulate the rich, unstructured learning humans use to acquire moral understanding. AI lacks the ability to learn from mental states, emotions, and relationships.
- Hybrid Approach:
- Integrates bottom-up learning with top-down principles or procedures.
- This is seen as the most promising path because it combines the adaptability of bottom-up learning with the normative guidance of top-down ideals.
13.5 The Limitations of a Hybrid Approach
- Limitations of Hybrid Approach: Even a hybrid system will likely fail in complex moral situations that humans with high moral intelligence can manage.
- Advanced Moral Capacities (that machines lack):
- Creative Moral Reasoning: Inventing new, appropriate moral solutions not determined by past experience.
- Moral Discourse: Cooperatively reasoning with other agents to negotiate moral solutions.
- Critical Moral Reflection: Critically evaluating one's own (and others') moral values and rules from a moral point of view.
- Moral Discernment: Recognizing new or subtle forms of moral importance and conflict.
- Holistic Moral Judgment: Making sense of a complex situation as a whole, beyond just adding up its ethical parts.
- How Humans and Machines Differ:
- Moral Expertise: The existence of human moral experts, however rare, is what allows human ethics to adapt and improve over time. Machines lack this capacity for progressive improvement.
- Embodiment: Humans have a rich set of embodied and affective capacities (affective empathy, hormonal signals, environmental sensitivity) that provide a massive flow of salient data about moral life. A machine without a body is at an immense informational disadvantage.
- Moral Understanding vs. Data Processing: Human understanding is semantic (based on meaning), while machine learning manipulates symbolic units that don't map onto real-world moral concepts without human interpretation. A machine can track data patterns from a face but has no semantic grasp of what a "face" means to a human.
- Moral Imagination: Humans can project themselves into possible futures and feel the moral weight of different choices. This allows for moral heroism. This capacity depends on our embodied, affective nature, which machines lack.
- Virtue Ethics as a Goal:
- Virtue ethics focuses on the character traits of a morally excellent agent (e.g., practical wisdom, honesty, justice).
- Virtues are context-adaptive skills that allow an agent to successfully navigate even novel or "wicked" moral problems.
- A truly trustworthy AGI would need something analogous to this embodied virtue to be safe "in the wild".
13.6 Virtue Ethics & Virtuous Machines
- What is Virtue Ethics?
- An ethical approach grounded in the character traits of a morally excellent agent (e.g., wisdom, honesty, justice) rather than rules or consequences.
- Virtues are context-adaptive skills that enable successful navigation of moral problems, even novel or "wicked" ones.
- Goal for Advanced AI:
- To be truly trustworthy, AI would need an analog to human virtue.
- This is the "gold standard" of moral capacity we must aim for if we want machines that can be fully entrusted with our safety.
- Challenge of Aligning Virtue with AI:
- AI can learn by modeling human exemplars (like in value alignment), but this is not enough.
- Reason 1 (Internal State): True virtue requires a deep understanding of the ethical field, not just mimicking behavior. This understanding is what allows an agent to know when to deviate from a learned pattern.
- Reason 2 (True Moral Standard): Virtuous behavior promotes long-term human flourishing (eudaimonia), an objective condition of social health. An AI modeling a "virtuous" person in an unjust society (e.g., an oppressive regime) would learn vice, not virtue.
13.7 Virtuous Agents: Four Key Capacities
For an AI to be truly virtuous, it would need analogs to four sophisticated capacities that humans possess. The practical obstacles to engineering these are immense.
- Moral Understanding:
- A holistic, integrated awareness of moral life that comes from embodied engagement with the world, not just a stored "map" of it.
- Why it's hard for AI:
- Human understanding is semantic (based on meaning); machine learning is symbolic (it tracks data patterns without grasping what they mean).
- Humans have a rich set of embodied and affective capacities (empathy, hormones, physical attunement) that provide a massive flow of morally salient data. An AI without a body is at an "immense informational disadvantage".
- Moral Perception:
- The ability to detect and track specific morally important features in an environment, especially novel ones.
- Why it's hard for AI: It relies on the full range of embodied and affective channels to intuitively "sense" a moral situation, which current AI lacks.
- Moral Reflection:
- The ability to evaluate one's own moral values from a higher-order perspective—to genuinely want to be morally better than you are.
- Why it's hard for AI: A machine that can do this could correct its own bad training. But this requires the machine to want to want something different, a capacity rooted in our embodied, reflective desires.
- Moral Imagination:
- The ability to project moral understanding into possible futures and feel the moral weight of different choices.
- Why it's hard for AI: Human moral motivation seems to require affective projection—the ability to feel what moral failure would be like. This capacity depends on our embodied, affective nature, which machines lack.
13.8 Virtue Embodiment
- Core Problem: Moral phenomena have inherent embodied meaning because they are linked to the flourishing or degradation of living beings.
- The Human Moral System: The entire human organism is a "moral navigation system," not just a specific part of the brain. Each of us takes direction from our relationship to the environment as a whole.
- The AI Disadvantage: An AI lacks the embodied faculties needed to navigate and integrate the holistic moral character of the social world. This is the key obstacle for machine ethics.
- The Path Forward: To build maximally safe and reliable AI, we may eventually need to engineer an analog to this embodied experience—an approach called artificial virtue embodiment.
13.9 Summary
- Asimov's Laws Showed Limits: Isaac Asimov's Three Laws of Robotics demonstrated that a simple, top-down rule-based morality is insufficient for ensuring proper robot behavior in complex situations.
- Asimov's Insight: He intuited that the laws must be a foundational, intrinsic feature of the robot's "positronic brain," not just a program added later. This is similar to the authors' argument that virtue must be deeply embodied.
- Near-Term Reality: For the next few decades, AI systems will likely operate in bounded moral contexts, where engineers add specific algorithms and constraints for those limited environments.
- Long-Term Goal: Trustworthy AGI will require a fundamental rethinking of system design. From the very beginning, the system must have a capacity for integrated moral learning and a drive toward embodied virtue.
- Conclusion: Only through the natural and necessary acquisition of something very much like virtue will advanced AI be truly trustworthy.