Why Might Misaligned, Advanced AI Cause Catastrophe? (Compilation)

By AI Safety Fundamentals Team (Published on August 8, 2023)

You may have seen arguments (such as these) for why people might create and deploy advanced AI that is both power-seeking and misaligned with human interests. This may leave you thinking, “OK, but would such AI systems really pose catastrophic threats?” This document compiles arguments for the claim that misaligned, power-seeking, advanced AI would pose catastrophic risks.

What do skeptics say in response?

We’ve included a list of links to counterarguments to AI risk in the optional resources on our site.

Skeptics often focus on the premise that people will develop and deploy these kinds of AI systems in the first place. Sometimes, though, skeptics are more optimistic that people will develop strong defensive measures, perhaps in response to signs of danger. Overall, experts tend to have much uncertainty and disagreement on these issues.

Caveats about the excerpts

While this compilation is not entirely comprehensive, it aims to cover the most prominent arguments. As another caveat, many of the original sources of the excerpts include footnotes that, for brevity, are not included here. And of course, since these are just excerpts, they do not provide the full context or details of the original texts.

We’ll see arguments for the following claims, which are mostly separate/independent reasons for concern:

Humanity’s past holds concerning analogies
AI systems have some major inherent advantages over humans
AIs could come to out-number and out-resource humans
People will face competitive incentives to delegate power to AI systems (giving AI systems a relatively powerful starting point)
Advanced AI would accelerate AI research, leading to a major technological advantage (which, if developed outside of human control, could be used against humans)

Humanity’s past holds concerning analogies

From “Is Power-Seeking AI an Existential Risk?” (Carlsmith, 2021):

The choice to create agents much more intelligent than we are should be approached with extreme caution. This is the basic backdrop view underlying much of the concern about existential risk from AI—and it would apply, in similar ways, to new biological agents (human or non-human).

Some articulate this view by appeal to the dominant position of humans on this planet, relative to other species. For example: some argue that the fate of the chimpanzees is currently in human hands, and that this difference in power is primarily attributable to differences in intelligence, rather than e.g. physical strength. Just as chimpanzees—given the choice and power—should be careful about building humans, then, we should be careful about building agents more intelligent than us.

This argument is suggestive, but far from airtight. Chimpanzees, for example, are themselves much more intelligent than mice, but the “fate of the mice” was never “in the hands” of the chimpanzees. What’s more, the control that humans can exert over the fate of other species on this planet still has limits, and we can debate whether “intelligence,” even in the context of accumulating culture and technology, is the best way of explaining what control we have.

More importantly, though: humans arose through an evolutionary process that chimpanzees did nothing to intentionally steer. Humans, though, will be able to control many aspects of processes we use to build and empower new intelligent agents.

Still, some worry about playing with fire persists. As our own impact on the earth illustrates, intelligent agents can be an extremely powerful force for controlling and transforming an environment in pursuit of their objectives. Indeed, even on the grand scale of earth’s history, the development of human abilities in this respect seems like a very big deal—a force of unprecedented potency. If we unleash much more of this force into the world, via new, more intelligent forms of non-human agency, it seems reasonable to expect dramatic impacts, and reasonable to wonder how well we will be able to control the results.

From “Cortés, Pizarro, and Afonso as Precedents for Takeover” (Kokotajlo, 2020):

Summary

In the span of a few years, some minor European explorers (later known as the conquistadors) encountered, conquered, and enslaved several huge regions of the world. That they were able to do this is surprising; their technological advantage was not huge. (This was before the scientific and industrial revolutions.) From these cases, I think we learn that it is occasionally possible for a small force to quickly conquer large parts of the world, despite:

1. Having only a minuscule fraction of the world’s resources and power

2. Having technology + diplomatic and strategic cunning that is better but not that much better

3. Having very little data about the world when the conquest begins

4. Being disunited

Which all suggests that it isn’t as implausible that a small AI takes over the world in mildly favorable circumstances as is sometimes thought.

AI systems have some major inherent advantages over humans

From “AGI safety from first principles” (Ngo, 2021) (formatting edited):

[The first paragraph below is included as context on general AI more broadly. The remaining paragraphs discuss advantages that AI systems have over humans.]

The key distinction I’ll draw [here] is between agents that understand how to do well at many tasks because they have been specifically optimised for each task (which I’ll call the task-based approach to AI), versus agents which can understand new tasks with little or no task-specific training, by generalising from previous experience (the generalisation-based approach). [...] [T]there are many economically important tasks which I expect AI systems to do well at primarily by generalising from their experience with very different tasks - meaning that those AIs will need to generalise much, much better than our current reinforcement learning systems can. [...] some jobs crucially depend on the ability to analyse and act on such a wide range of information that it’ll be very difficult to train directly for high performance on them. Consider the tasks involved in a role like CEO: setting your company’s strategic direction, choosing who to hire, writing speeches, and so on. Each of these tasks sensitively depends on the broader context of the company and the rest of the world. [...] These variables are so broad in scope, and rely on so many aspects of the world, that it seems virtually impossible to generate large amounts of training data via simulating them (like we do to train game-playing AIs). And the number of CEOs from whom we could gather empirical data is very small by the standards of reinforcement learning (which often requires billions of training steps even for much simpler tasks).

[...]

I think it’s difficult to deny that in principle it’s possible to build individual generalisation-based AGIs [i.e. artificial general intelligences] which [surpass humanity’s collective ability in virtually all domains of interest], since human brains are constrained by many factors which will be much less limiting for AIs.

- Perhaps the most striking is the vast difference between the speeds of neurons and transistors: the latter pass signals about four million times more quickly [so advanced AI systems may be able to think far more quickly than humans do]. Even if AGIs never exceed humans in any other way, a speedup this large would allow one to do as much thinking in minutes or hours as a human can in years or decades.

- Meanwhile our brain size is important in making humans more capable than most animals - but I don’t see any reason why a neural network couldn’t be several orders of magnitude larger than a human brain.

- And while evolution is a very capable designer in many ways, it hasn’t had much time to select specifically for the skills that are most useful in our modern environment, such as linguistic competence and mathematical reasoning. So we should expect that there are low-hanging fruit for improving on human performance on the many tasks which rely on such skills.

[...]

In terms of replication, AIs are much less constrained than humans: it’s very easy to create a duplicate of an AI which has all the same skills and knowledge as the original. The cost of compute for doing so is likely to be many times smaller than the original cost of training an AGI (since training usually involves running many copies of an AI much faster than they’d need to be run for real-world tasks). Duplication currently allows us to apply a single AI to many tasks, but not to expand the range of tasks which that AI can achieve. However, we should expect AGIs to be able to decompose difficult tasks into subtasks which can be tackled more easily, just as humans can. [...] [These arguments] are also reasons why individual AGIs will be able to surpass us at the skills required for coordination (such as language processing and theories of mind).

AIs could come to out-number and out-resource humans

From “AI Could Defeat All Of Us Combined” (Karnofsky, 2022) (formatting edited):

[...] I want to be clear that I don't think the danger relies on the idea of "cognitive superpowers" or "superintelligence" - both of which refer to capabilities vastly beyond those of humans. I think we still have a problem even if we assume that AIs will basically have similar capabilities to humans, and not be fundamentally or drastically more intelligent or capable. I'll cover that next.

How AIs could defeat humans without "superintelligence":

If we assume that AIs will basically have similar capabilities to humans, I think we still need to worry that they could come to out-number and out-resource humans, and could thus have the advantage if they coordinated against us.

[...] I'm using the [...] framework in which it's much more expensive to train (develop) this [AI] system than to run it (for example, think about how much Microsoft spent to develop Windows, vs. how much it costs for me to run it on my computer). [One quantitative model] implies that once the first human-level AI system is created, whoever created it could use the same computing power it took to create it in order to run several hundred million copies for about a year each.

This would be over 1000x the total number of Intel or Google employees, over 100x the total number of active and reserve personnel in the US armed forces, and something like 5-10% the size of the world's total working-age population.

And that's just a starting point.

- This is just using the same amount of resources that went into training the AI in the first place. Since these AI systems can do human-level economic work, they can probably be used to make more money and buy or rent more hardware, which could quickly lead to a "population" of billions or more.

- In addition to making more money that can be used to run more AIs, the AIs can conduct massive amounts of research on how to use computing power more efficiently, which could mean still greater numbers of AIs run using the same hardware. This in turn could lead to a feedback loop and explosive growth in the number of AIs.

Each of these AIs might have skills comparable to those of unusually highly paid humans, including scientists, software engineers and quantitative traders. It's hard to say how quickly a set of AIs like this could develop new technologies or make money trading markets, but it seems quite possible for them to amass huge amounts of resources quickly. A huge population of AIs, each able to earn a lot compared to the average human, could end up with a "virtual economy" at least as big as the human one.

To me, this is most of what we need to know: if there's something with human-like skills, seeking to disempower humanity, with a population in the same ballpark as (or larger than) that of all humans, we've got a civilization-level problem.

[How could AIs threaten humanity if they are merely “virtual”?]

A potential counterpoint is that these AIs would merely be "virtual": if they started causing trouble, humans could ultimately unplug/deactivate the servers they're running on. I do think this fact would make life harder for AIs seeking to disempower humans, but I don't think it ultimately should be cause for much comfort. I think a large population of AIs would likely be able to find some way to achieve security from human shutdown, and go from there to amassing enough resources to overpower human civilization (especially if AIs across the world, including most of the ones humans were trying to use for help, were coordinating).

I spell out what this might look like in an appendix. In brief:

- By default, I expect the economic gains from using AI to mean that humans create huge numbers of AIs, integrated all throughout the economy, potentially including direct interaction with (and even control of) large numbers of robots and weapons. (If not, I think the situation is in many ways even more dangerous, since a single AI could make many copies of itself and have little competition for things like server space, as discussed in the appendix.)

- AIs would have multiple ways of obtaining property and servers safe from shutdown. For example, (a) they might recruit human allies (through manipulation, deception, blackmail/threats, genuine promises along the lines of "We're probably going to end up in charge somehow, and we'll treat you better when we do") to rent property and servers and otherwise help them out. (b) Or they might create fakery so that they're able to operate freely on a company's servers while all outward signs seem to show that they're successfully helping the company with its goals.

- A relatively modest amount of property safe from shutdown could be sufficient for housing a huge population of AI systems that are recruiting further human allies, making money (via e.g. quantitative finance), researching and developing advanced weaponry (e.g., bioweapons), setting up manufacturing robots to construct military equipment, thoroughly infiltrating computer systems worldwide to the point where they can disable or control most others' equipment, etc.

- Through these and other methods, a large enough population of AIs could develop enough military technology and equipment to overpower civilization - especially if AIs across the world (including the ones humans were trying to use) were coordinating with each other.

People will face competitive incentives to delegate power to AI systems

From “What Multipolar Failure Looks Like, and Robust Agent-Agnostic Processes (RAAPs)” (Critch, 2021):

[To illustrate the plausibility that competitive incentives will cause people to delegate power to AI, this piece presents a hypothetical narrative about how AI might be used in the future.]

Someday, AI researchers develop and publish an exciting new algorithm for combining natural language processing and planning capabilities. Various competing tech companies develop "management assistant'' software tools based on the algorithm, which can analyze a company's cash flows, workflows, communications, and interpersonal dynamics to recommend more profitable business decisions. It turns out that managers are able to automate their jobs almost entirely by having the software manage their staff directly, even including some “soft skills” like conflict resolution.

Software tools based on variants of the algorithm sweep through companies in nearly every industry, automating and replacing jobs at various levels of management, sometimes even CEOs. Companies that don't heavily automate their decision-making processes using the software begin to fall behind, creating a strong competitive pressure for all companies to use it and become increasingly automated.

[...]

Some banks themselves become highly automated in order to manage the cash flows, and more [...] companies end up doing their banking with automated banks. Governments and regulators struggle to keep track of how the companies are producing so much and so cheaply, [...] so they demand that production web companies and their banks produce more regular and detailed reports on spending patterns, how their spending relates to their business objectives, and how those business objectives will benefit society. However, some countries adopt looser regulatory policies to attract more [...] companies to do business there, at which point their economies begin to boom in terms of GDP, dollar revenue from exports, and goods and services provided to their citizens. Countries with stricter regulations end up loosening their regulatory stance, or fall behind in significance.

From “GitHub Copilot now has a better AI model and new capabilities” (Zhao, 2023):

[Added context: GitHub is an Internet service (and the de facto industry standard) for hosting and managing code. GitHub Copilot is an AI that writes lines of code for software developers. The use of GitHub Copilot is an example of the broad and increasing delegation of tasks to AI systems. Such delegation constitutes delegation of power when a task is high-stakes and supervision is difficult or too expensive (in terms of labor costs).]

When we first launched GitHub Copilot for Individuals in June 2022, more than 27% of developers’ code files on average were generated by GitHub Copilot. Today, GitHub Copilot is behind an average of 46% of a developers’ code across all programming languages—and in Java, that number jumps to 61%.

Advanced AI would accelerate AI research, leading to a major technological advantage

From “Artificial Intelligence as a Positive and Negative Factor in Global Risk” (Yudkowsky, 2008) (content in brackets is reframed to reflect more recent developments in AI):

[A]n Artificial Intelligence might increase in intelligence extremely fast. The obvious reason to suspect this possibility is recursive self-improvement (Good 1965). The AI becomes smarter, including becoming smarter at the task of [developing advanced AI, leading to a rapid positive feedback loop].

From “AI Could Defeat All Of Us Combined” (Karnofsky, 2022):

[Under one view of how misaligned, advanced AI would pose a threat, risk comes from] an AI system that can do things like:

- Do its own research on how to build a better AI system, which culminates in something that has incredible other abilities.

- Hack into human-built software across the world.

- Manipulate human psychology.

- Quickly generate vast wealth under the control of itself or any human allies.

- Come up with better plans than humans could imagine, and ensure that it doesn't try any takeover attempt that humans might be able to detect and stop.

- Develop advanced weaponry that can be built quickly and cheaply, yet is powerful enough to overpower human militaries.