What risks does AI pose?
AI systems already pose many significant existing risks including harmful malfunctions, discrimination, reducing social connection, invasions of privacy and disinformation. Training and deploying AI systems can also involve copyright infringement and worker exploitation.
Future AI systems could exacerbate anticipated catastrophic risks, including bioterrorism, misuse of concentrated power, nuclear and conventional war. We might also gradually or suddenly cede control to AI systems - or these systems might attempt to take control themselves.
This document compiles explanations of these risks, drawing heavily from previous overviews by academics. For brevity, we can't cover every possible risk but we encourage readers to further research areas that interest them.
Individual malfunctions
AI systems can make mistakes if applied inappropriately. For example:
- A self-driving car in San Francisco collided with a pedestrian that was thrown into its path by a human driver. This was arguably not its fault - however, after initially stopping it then started moving again, dragging the injured pedestrian a further six meters along the road. Government investigators alleged that the company initially hid the severity of the collision from them.
- A healthcare chatbot deployed in the UK was heavily criticised when it advised users potentially experiencing a heart attack not to get treatment. When these concerns were raised by a doctor, the company released a statement calling them a "Twitter troll".
Furthermore, use of AI systems can make it harder to detect and address process issues. Outputs of computer systems are likely to be overly trusted. Additionally, because most AI models are effectively black boxes and AI systems are much more likely to be protected from court scrutiny than human processes, it can be hard to prove mistakes.
Discrimination
AI systems are trained with data that can reflect existing societal biases and problematic structures, resulting in systems that learn and amplify those biases (Turner Lee, et al., 2019). For example:
- Hiring systems have shown biases against minorities when trained on data from industries that lack diversity.
- Algorithms used in the US criminal justice system have been shown to be biased against African Americans, resulting in longer periods of detention while awaiting trial.
AI systems that might not have inherent biases may still exacerbate discriminatory practices based on the societal context in which they are deployed. For example, unequal access to AI knowledge and skills could further entrench inequality. Those with higher incomes, more formal education, and exposure to technology are more likely to be aware of and use advanced AI tools to their benefit.
Reducing social connection
Two effects can result in greater polarisation of opinions, reducing social connection:
- Recommender systems can result in filter bubbles, where users are only shown a particular type of content - for example, that which agrees with their views.
- Optimising for engagement can result in amplifying divisive content, such as articles that promote moral outrage or misinformation, despite users not wanting this. This can push users to more extreme positions, although this is debated.
In addition, chatbots may lead to unhealthy expectations in human relationships. Some virtual friend or romantic partner apps have millions of active customers, and it's uncertain what the results of using these apps will be.
Invasions of privacy
AI systems are often used to process personal data in unexpected or undesirable ways. For example:
- Period-tracking apps are used by over a third of women in the UK and US to help track their menstrual cycles, including 69% of UK women aged 18-24. These apps sometimes share information about their users for targeted advertising. Even when the companies building AI recommender systems do not intend this, AIs could learn from this data that it's effective to target people when they're most vulnerable to specific types of advertising, such as after a miscarriage, becoming pregnant, or recording particular symptoms such as mood swings or anxiety.
- AI-enabled facial recognition systems in public spaces can cause significant harms. One startup sold facial recognition tech to businesses and governments in the US, Panama, Brazil, Mexico, and the Dominican Republic, after training it on 30 billion internet photos. This type of system could be misused by campaign groups or journalists to publish lists of people visiting LGBTQ centres, sexual health clinics or rehab facilities, who might want to keep that information private. Data generated by this system could be stored indefinitely with no intention of being used maliciously, but later misused: many Jews were identified from church and government tax records during the Holocaust. Multiple recent examples of states criminalising previously legal behaviour amplify privacy risks, such as abortion in the US (2022) or homosexuality in India (2013).
Copyright infringement
Copyrighted works are used to train many AI systems, including basically all large language models and image generation models. Datasets are scraped from the public internet without the consent of rights holders - effectively meaning the original creators of this work lose control as to how their work is used.
In deployment, AI systems may recreate exact or close copies of copyrighted content. These outputs might enable 'copyright laundering', for example to reproduce unique art styles or remove copyleft protections from open-source software libraries.
It's uncertain how these legal issues will play out in the courts, as well as how wider solutions to this tension will be resolved by policymakers.
Worker exploitation
Training an AI often requires lots of human-annotated data. For example, content moderation systems might need thousands of examples of harmful content to train the model to identify what violates platform guidelines.
This work can repeatedly expose people to harmful content, damaging the mental health of labellers. Given the work is often outsourced to contractors in countries with lower health and safety standards, workers are also less likely to be able to get appropriate support to handle mental health problems arising from their work. This outsourced labour can also result in workers being underpaid, overworked or otherwise mistreated (such as being silenced through the use of NDAs or threats of being fired).
Disinformation
AI-boosted disinformation undermines societies’ ability to address catastrophes. Goldstein et al. (2023) identify three potential areas of change:
Actors: Language models could drive down the cost of running influence operations, placing them within reach of new actors and actor types.
Behavior: Influence operations with language models will become easier to scale, and tactics that are currently expensive (e.g., generating personalized content) may become cheaper. Language models may also enable new tactics to emerge—like real-time content generation in chatbots.
Content: Text creation tools powered by language models may generate more impactful or persuasive messaging compared to propagandists, especially those who lack requisite linguistic or cultural knowledge of their target. They may also make influence operations less discoverable, since they repeatedly create new content without needing to resort to copy-pasting and other noticeable time-saving behaviors.
Bioterrorism
AI advances could worsen the risks of bioterrorism. Hendrycks et al. (2023) provide a useful introduction to this risk. The following is a lightly edited excerpt from their paper:
Bioengineered pandemics present a new threat. Biological agents, including viruses and bacteria, have caused some of the most devastating catastrophes in history. It’s believed the Black Death killed more humans than any other event in history, an astounding and awful 200 million, the equivalent to four billion deaths today. Engineered pandemics could be designed to be more lethal or easily transmissible than natural pandemics.
AIs could be used to expedite the discovery of new, more deadly chemical and biological weapons. In 2022, researchers took an AI system designed to create new drugs and tweaked it to reward, rather than penalize, toxicity. Within six hours, it generated 40,000 candidate chemical warfare agents. It designed not just known deadly chemicals including VX, but also novel molecules that may be deadlier than any chemical warfare agents discovered so far. In the field of biology, AIs have already surpassed human abilities in protein structure prediction and made contributions to synthesizing those proteins.
AIs compound the threat of bioengineered pandemics. AIs will increase the number of people who could commit acts of bioterrorism. General-purpose AIs like ChatGPT are capable of synthesizing expert knowledge about the deadliest known pathogens, such as influenza and smallpox, and providing step-by-step instructions about how a person could create them while evading safety protocols.
The exponential nature of biological threats means that a single attack could spread to the entire world before an effective defense could be mounted. Only 100 days after being sequenced, the omicron variant of COVID-19 had infected a quarter of the United States and half of Europe. Quarantines and lockdowns instituted to suppress the COVID-19 pandemic caused a global recession and still could not prevent the disease from killing millions worldwide.
Research into how much AI systems increase the potential for bioterrorism is ongoing. A recent study published by RAND found current safety-tuned LLMs did not substantially increase the risk over what was already present on the internet. However, a study by MIT and others found without safety-tuning (which can be removed from open models like Llama 2 for $200), LLMs enabled people to obtain all the key information to synthesise 1918 influenza in under an hour.
Authoritarianism, Inequality, and Bad Value Lock-in
Dafoe (2020) describes ways AI could lead to power being concentrated and then misused:
- Global winner-take-all markets: Whoever leads in selling access to broadly capable AI systems may be able to offer many customers the best deal for a wide range of services. This could greatly concentrate wealth, which would incentivize authoritarian coups (while also making it easier to suppress democratic revolutions, as discussed below).
- Labor displacement: Historically, new technologies have often automated some jobs while creating new jobs. However, broadly capable AIs would be historically unprecedented. If AIs can do nearly every task at least as well as a human, this may leave few jobs left for humans—especially since AIs do not need to rest, can learn vast amounts of information, and can often complete tasks far more quickly and cheaply than humans.
- Authoritarian surveillance and control: AIs can be used to flag content for censorship, analyze dissidents’ activities, operate autonomous weapons, and persuade people. This could allow totalitarian governments to surveil and exert complete control over the population without the need to enlist millions of citizens to serve as willing government functionaries. Additionally, AIs could make totalitarian regimes much longer-lasting; a major way in which such regimes have been toppled previously is at moments of vulnerability like the death of a dictator, but AIs which would be hard to “kill” could provide much more continuity to leadership, providing few opportunities for reform. (Hendrycks et al., 2023).
Concentration of power may make it easier to permanently entrench certain values. Hendrycks et al. (2023) argue:
Just as we abhor some moral views widely held in the past, people in the future may want to move past moral views that we hold today, even those we currently see no problem with. For example, moral defects in AI systems would be even worse if AI systems had been trained in the 1960s, and many people at the time would have seen no problem with that.
War
Analysts have highlighted several pathways by which AI could increase the risk of war, including nuclear war.
AI could undermine nuclear deterrence. Nuclear war is often thought to be prevented partly by nuclear deterrence: if a state launched a nuclear strike, it would risk being nuked in retaliation. However, advances in AI could undermine states’ retaliatory strike capabilities, given states might use AI to…
- …locate nuclear-armed submarines, by using AI to analyze sensor data and perhaps improve other aspects of reconnaissance drone technology. This would be problematic, because states often view nuclear-armed submarines as their most resilient nuclear deterrent, due to the difficulty of locating them. However, the technical plausibility of this is debated.[1]
- …make mid-air adjustments to strikes against mobile missile launchers, e.g. through satellite image analysis. This would reduce the difficulty of destroying these weapons (which also have deterrent functions).
- …asymmetrically improve missile defense, which could neutralize rivals’ retaliatory strikes. However, missile defense has historically been highly difficult.
- …execute cyberattacks that disable rivals’ retaliatory capabilities.
AI could inadvertently escalate conflicts (video). While analysts consider it unlikely militaries would give AIs direct control over nuclear weapons, AI could escalate conflicts in other ways. Decision-makers might trust flawed recommendations about escalation, lethal autonomous weapons could inadvertently initiate or expand violent conflicts, and faster AI decisions might reduce time available to de-escalate situations.
AI could create other incentives for war. International relations scholars often find “large, rapid shifts in the distribution of power lead to war” (Powell, 2006). When faced with a rising rival, states may need to choose between (i) declaring war while the rival is relatively weak, or (ii) later being threatened by a more powerful rival. This situation incentivizes war. Above, we saw one way AI might contribute to a large, rapid shift in power: undermining nuclear deterrent capabilities. AI might also cause other destabilizing power shifts, such as accelerating economic and technological development in certain states.
Gradual loss of control
We might gradually give power to AI systems, and over time this could result in humans not effectively having control over society or wasting most of our potential.
One way this could happen, based on Christiano (2019):
- There are strong economic incentives to give some control to AI systems - initially maybe just to automate some job tasks, then entire jobs, and eventually AI may be running entire companies or government institutions.
- We might give these systems easy-to-measure goals that are not really what we want (a form of outer misalignment). For example, an easy-to-measure goal for an AI running a company could be ‘increase the share price’. This might initially work, but eventually could lead to a corporation defrauding consumers, falsifying its accounts and hiding this from regulators. Instead, what we really want here is something closer to ‘cause the company to provide more legal and valuable products and services’ – although even this goal can likely be gamed if you think hard enough.
- These systems might be hard for humans to understand and oversee if the strategies that models pursue are complex. Human oversight might also be cut simply because it’s costly. This could lead to these problems going unchecked.
- Even where there are obvious problems, economic incentives could push people to deploy systems anyway - particularly if they believe they are racing against others. For example, Microsoft launched Bing Chat early despite OpenAI’s warnings about inaccurate or unhinged responses.
- Combined and at scale, these factors could lead to a world where a lot is going wrong - and systems are too complex or powerful to fix. Humans may still be around, but are effectively no longer in control.
A common objection is that we wouldn’t deploy systems too complex to understand, or knowingly problematic, in positions of power - or that human oversight would be sufficient. However, there are many counterexamples to this, such as the British Post Office scandal: in the late 1990s, the Post Office rolled out a new IT system that had bugs causing accounting errors. As a result, it prosecuted hundreds of subpostmasters for decades, insisting the system was reliable despite mounting evidence it was fundamentally flawed. It continued to fight court cases into 2019, even when it knew its court defence was false. Despite the human oversight supposedly offered by the court system, 236 subpostmasters were falsely imprisoned for years, and the false prosecutions directly caused at least four suicides.
Sudden loss of control
Alternatively, loss of control may happen very suddenly - often called a ‘hard takeoff’ or ‘FOOM’.
This could happen if an AI system is good enough to autonomously improve its own capabilities. If this happens, an AI system could increase its capability at potentially an exponential rate as when the model becomes more capable it gets better at self-improvement - this is known as an ‘intelligence explosion’. This could happen very rapidly - making it very hard for humans to control or even just react.
If uncontrolled, systems optimising for the wrong goal could be catastrophic: both if outcomes similar to gradual loss of control were accelerated, or if it leads into an active takeover scenario (see below).
Even if it is controlled, and doesn’t immediately result in a catastrophe, it would give the actor in charge of the AI system a huge amount of power which could be abused or otherwise lead to significant instability.
Active takeover
In both of the above scenarios, we may lose control without the AI system intending to disempower humans. However, this could be accelerated if AI systems intentionally try to do this.
They might be incentivised to do so because pursuing convergent instrumental subgoals such as self-preservation, gaining resources and maintaining power could help them achieve their overall goals better. Humans in power might pose a significant threat or obstacle to all of these subgoals.
Systems might initially try to hide this behaviour, and appear safe so that they aren’t switched off. Once they believe they are in a position where they can take over, they may suddenly act in a dangerous way - known as a treacherous turn.
Alternatively, they might just try to do this because they malfunction in high-stakes scenarios. Regardless of how ‘intentional’ this action is, it could still result in the same outcome. For example, Bing Chat sometimes threatens to kill or hurt people who criticise it (although luckily it doesn’t have the capability to carry out this threat), despite being a relatively simple text-prediction system without other structured goals.
It’s genuinely unclear whether the way we train systems will lead to this. Training systems (particularly with reinforcement learning) to take actions in the world with high self-awareness seems likely to increase this risk. Unfortunately, there are clear economic incentives to drive towards this as these systems are likely more able to effectively augment or replace human labour.
It’s also very unclear whether our current approaches to AI safety will mitigate these threats. We still don’t know how to robustly make AI systems do what their creators want them to do. AI governance approaches often don’t address takeover risks, and even where they do only somewhat mitigate the threat. Many researchers worry that if incredibly intelligent AI systems are actively optimising against safeguards it’s likely that imperfect safeguards applied to a fundamentally misaligned model will break at some point (which could be quickly, in the case of an intelligence explosion).
Unknown Risks
New technologies often pose risks that are hard to foresee. As Dafoe (2020) mentions, risks from the combustion engine have included "urban sprawl, blitzkrieg offensive warfare, strategic bombers, and climate change"—presumably a hard list to predict in the 1800s.
Despite this, we can still indirectly prepare for them. We can make institutions better at identifying and responding to new AI risks as they emerge. We can also improve our responsiveness to currently unknown risks, in part by remembering that we might not yet know all the risks.
Footnotes
Skeptics argue there are enormous challenges in using drones to reliably locate submarines, such as vast search areas, limitations of sensor technology, short battery lives, and low drone speeds. On the other hand, perhaps these problems could be overcome with technological advances (e.g. improved sensor data analysis) and innovative approaches (e.g. using relatively centralized recharging stations or surface-level solar panels to recharge drones).
This is different from the earlier concern about the lock-in of bad values. In an earlier section, we considered how concentration of power could lock in bad values. Here, the concern is that unconstrained competition could lock in bad values. (Of course, which values are bad is highly contested.)
[Footnote from the excerpt] Note that bargaining failure is not the only cause of catastrophic interactions. For instance, the interactions of Lethal Autonomous Weapon Systems might also be catastrophic.