Avoiding Extreme Global Vulnerability as a Core AI Governance Problem

By AI Safety Fundamentals Team (Published on November 8, 2022)

Much has been written framing and articulating the AI governance problem from a catastrophic risks lens, but these writings have been scattered. This page aims to provide a synthesized introduction to some of these already prominent framings. [1] This is just one attempt at suggesting an overall frame for thinking about some AI governance problems; it may miss important things.

Some researchers think that unsafe development or misuse of AI could cause massive harms. A key contributor to some of these risks is that catastrophe may not require all or most relevant decision makers to make harmful decisions. Instead, harmful decisions from just a minority of influential decision makers—perhaps just a single actor with good intentions—may be enough to cause catastrophe. For example, some researchers argue, if just one organization deploys highly capable, goal-pursuing, misaligned AI—or if many businesses (but a small portion of all businesses) deploy somewhat capable, goal-pursuing, misaligned AI—humanity could be permanently disempowered [2].

The above would not be very worrying if we could rest assured that no actors capable of these harmful actions would take them. However, especially in the context of AI safety, several factors are arguably likely to incentivize some actors to take harmful deployment actions:

Misjudgment: Assessing the consequences of AI deployment may be difficult (as it is now, especially given the nature of AI risk arguments [3]), so some organizations could easily get it wrong—concluding that an AI system is safe or beneficial when it is not.
“Winner-take-all” competition: If the first organization(s) to deploy advanced AI is expected to get large gains, while leaving competitors with nothing, competitors would be highly incentivized to cut corners in order to be first [4]—they would have less to lose.
Externalities: By default, actors who deploy advanced AI first by cutting corners would stand to receive all of the potential benefits of their deployment, while only incurring a small fraction of the added global risk (especially if they are only concerned with the interests of a small group).
Race to the bottom: The above dynamics may involve a dangerous feedback loop. If I expect someone to deploy advanced AI unsafely or to misuse it, I am incentivized to cut corners to beat them to it, even if I am completely informed and concerned about all the risks. After all, I may think that my deployment would be less dangerous than theirs. (And that may incentivize them to cut more corners, in a vicious cycle.) [5]

Multiplying [6] the difficulty of the above problem, a few factors are arguably likely to create many opportunities for actors to carry out catastrophically harmful deployment:

Delayed safety: There may be a substantial delay between when some organization knows how to build powerful AI and when some organization knows how to do so safely. After all, such safety delays are common in many industries. Additionally, it may be infeasible to solve AI safety problems before risky AI capabilities are created, since these capabilities may provide testbeds and tools that are critical for solving safety problems.
- (This delay may be the period of especially high risk; soon after this delay ends, risks from unsafe AI may be greatly reduced, because incentives to deploy it may be lower and safe AI may increase humanity’s resilience.)
Rapid diffusion of AI capabilities: Soon after some actor becomes capable of deploying unsafe AI, many other actors may also gain that capability. After all, recent AI advances have diffused quickly (including internationally) [7], information security weaknesses could cause AI advances to diffuse even faster, the number of actors explicitly aiming to develop general AI has been increasing, and that trend may accelerate when general AI is seen as being more within reach.

Overall, then, we may be headed toward a significant period of time in which many actors will have the capacity and incentives to (unintentionally) deploy catastrophically harmful AI. This would be a highly risky state of affairs—a vulnerable world, a recipe for catastrophe.

Does the existence of this possibility spell doom? Not necessarily. In the above situation, risk comes from the fact that many actors are able and incentivized to deploy dangerous AI for a substantial period of time [8]. Many aspects of that situation can at least theoretically be changed, reducing risk [9], for example through the following high-level approaches:

Nonproliferation: Actors might (coordinate to) slow the diffusion of the capability to deploy overly risky AI [10], reducing the number of actors who can unilaterally make harmful decisions.
Deterrence: Actors might (coordinate to) create disincentives (e.g., political, regulatory, economic, cultural) against deployment actions that cause global risk, countering problems from externalities. [11]
Assurance: Actors might (coordinate or create mechanisms to) credibly assure each other that they are not developing or deploying overly risky AI, reducing others’ incentives to preempt them with even more rushed deployment.
Awareness: Actors might help potential AI deployers be well-informed about risks, reducing misjudgment.
Sharing: AI developers might (coordinate or create mechanisms to) credibly commit to sharing the benefits and influence of AI, mitigating harmful “winner-take-all” incentives. [12]
Speeding up safety: Actors might shorten (or, if possible, eliminate [13]) the period in which dangerous deployment decisions are possible but (affordable) protective technologies have not yet been developed, e.g., through technical work on safety.

(In addition to their relevance to AI alignment, most of these arguments arguably transfer to concerns over catastrophic misuse of narrow AI, as well as to broader concerns over value erosion from competition, or perhaps other coordination failures among AI systems. That suggests guardrailing competition on AI is robustly valuable. However, some researchers worry that an overly centralized approach to AI development could have some risks of its own, such as bad “value lock-in” or AI-enabled authoritarianism.)

Footnotes

In particular, this document’s framing of the AI governance problem aims to synthesize several framings that have collectively been highlighted in writing by the researchers Bostrom [1] [2] [3], Christiano [1] [2], Critch, Dafoe [1] [2], Ord, Yudkowsky, and Zabel and Muehlhauser, among others, especially the following framings:
- Concerns over “value erosion from competition,” or a “race to the bottom” on AI, i.e., dynamics of intense, unconstrained competition leading to bad outcomes
- Expectations that advanced AI deployment will be highly centralized or that it will be highly decentralized
- The “Vulnerable World Hypothesis,” including concerns over risks from many actors having the ability to unilaterally take harmful action with AI
- Framing of AI governance as a coordination problem
- Concerns that there will be tradeoffs between the performance and safety (or other important values) of AI systems, and in particular that AI safety will add significant development time to any AI project
- Concerns over the potential for bad “value lock-in,” such as lasting authoritarian regimes, especially if AI governance is too coordinated
Other examples: If just one actor misuses AI by making and deploying weapons that are widely harmful (e.g., bioweapons), that could cause global devastation. And if just some actor(s) pursues much influence while sacrificing important values, they may get that influence, while lastingly reducing the value of how the influence will be used.
Overconfidence may be especially likely if good arguments for AI being risky happen to be complex, theoretical, or focused on catastrophes that have never happened yet. See e.g., the paper, “Risks From Learned Optimization in Advanced Machine Learning Systems” for an example of arguments that some researchers see as correct despite their potentially unintuitive wrappings.
This can be thought of as a commitment problem; the initially leading AI developer cannot credibly commit to sharing its gains, so other AI developers create global risk to get ahead, leaving everyone worse off than they could have been.
Wouldn’t I anticipate this and therefore not cut corners in the first place? Hopefully, but I may lack the foresight to do that, I may be willing to take the risk of the other actor undercutting me (in hopes that it won’t happen, or to strengthen my negotiating position by demonstrating resolve), and I may have political incentives to pursue a merely temporary lead. Alternatively, if I am close enough to deployment, others may be unlikely to beat me by further cutting corners.
Technically the effect of these factors isn’t quite multiplicative.
About 1 year after OpenAI announced GPT-3, a language model with an unprecedented number of parameters, researchers at the Chinese company Huawei released a comparably large model (though some researchers caveat that large language models in China have been created mostly through replication and that they have significantly worse performance). Within about half a year after that, a total of roughly 6 AI developers—also including ones in South Korea and Israel—had released similarly large language models. Additionally, within 2 months after OpenAI announced the cutting-edge image generation model DALLE-2, Google released a similar (and by some metrics better) model.
Note that we reached our conclusion about potential danger without assuming that AI safety or alignment are extraordinarily difficult technical problems.
Not only is constraining competition plausibly (and debatably) necessary for AI safety; it is also arguably nearly sufficient for ensuring that AI is safe. With competition constrained, AI developers’ incentives for self-preservation may be enough for them to invest sufficiently in safety, especially if preliminary safety research and advocacy has been done.
This could mean slowing the diffusion of knowledge about how to create risky AI, but it could also mean slowing the diffusion (or restricting the availability) of critical inputs other than knowledge. Mergers or joint R&D projects could also reduce the number of independent unilateralists.
Additionally, deterrence of proliferation can advance nonproliferation.
Potential commitment mechanisms include: the Windfall Clause, the institutional and cultural commitment mechanisms of OpenAI’s Charter and LP Announcement, new multilateral institutions, and AI-based mechanisms.
Eliminating this delay would of course be ideal; it would nearly solve these problems. However, for the reasons discussed in the earlier section on delayed safety, this may be infeasible. If that’s the case, then technical safety work is in principle insufficient for making it very likely that AI is safe; the problems discussed in this document would still arise and need to be dealt with.