Zentrum für interdisziplinäre Forschung
The AI Risk and Safety Landscape
by Alexander Koch (Paris), Benjamin Paaßen (Bielefeld) and Elizabeth Herbert (München)
With the increasing deployment of large language models and other artificial intelligence systems, concerns about risks become increasingly pressing. While there is a broad consensus in the scientific community that AI systems ought to be safe, there is surprisingly large disagreement regarding which risks are likely to occur and which safety strategies appear promising to resolve them. The ZiF workshop “AI Risk and Safety Landscape” brought researchers from computer science, philosophy, neuroscience, mathematics and social science together to map out the breadth of AI-related risks and safety approaches and identify the specific uncertainties and differing assumptions that explain the observed disagreement.
Regarding the range of risks, the workshop covered both: tangible risks posed by present-day AI systems such as discrimination, privacy violations, and technical failures due to inherent lack of reliability or deliberate adversarial attacks; as well as catastrophic or even existential risks posed by future, potentially more advanced AI systems, such as hijacking critical infrastructure or releasing new types of viruses. As the latter kind of risk has never materialized, such risks are intrinsically difficult to estimate and, hence, associated with substantial disagreement in the scientific community. However, there appears to be agreement that most risk scenarios are not purely technical but socio-technical: automatic discrimination, for example, becomes a risk because of how AI systems are integrated into high-stakes decisions such as hiring, medicine, or education. Many of the catastrophic risk scenarios also involve present-day societal trends – for example, people who are socially isolated may become emotionally and psychologically dependent on an AI companion, radicalize over time and ultimately perform catastrophically dangerous acts. One tension in AI risk research is the limited amount of public attention: public discourse, as evidenced by media studies, focuses mostly on economic risks such as job displacement, to the detriment of other risk categories, especially those involving marginalized populations.
The workshop also covered a wide span of safety approaches, from technical methods, policy regulations such as the EU AI Act, to social measures such as science communication and critical AI literacy – helping the public understand the ways in which AI systems are dissimilar to humans. One vision underlying many technical approaches is the notion of alignment, referring to the challenge of ensuring that systems act as humans would want them to, in any situation. Example directions are research on mathematically specifying the zone of safe operation and proving that AI systems will never leave this zone; or approaches to interpret and explain the inner workings of AI systems, involving concepts from neuro- and cognitive science. However, workshop participants have also voiced concerns that alignment approaches may be fundamentally limited: there may always be rare or unspecified scenarios (e.g. due to hacking) that are not covered by a formal safety specification or our interpretations of system behavior, and therefore leave a residual risk.
A cross-cutting concern was that the scientific community does not even agree on the capabilities and limitations of present-day AI systems, let alone future ones. This may be partially explained by researchers focusing either on risks arising from over-estimation or under-estimation of AI capabilities. Over-estimating the capabilities of AI systems may lead to over-reliance, such as replacing human labor with AI systems even though the systems are not capable enough. On the other hand, under-estimating AI capabilities may enhance the risks that those capabilities lead to harm, especially catastrophic harm. Whether one focuses on the limitations or capabilities of current-day AI strongly influences which risk categories are considered the most urgent.
Some research even concerns the question of whether future AI systems may develop consciousness, and tries to develop tests for consciousness based on neuroscientific insights. However, the workshop participants could agree that consciousness and capabilities are conceptually distinct, with consciousness having little impact on AI systems’ risk potential. Given the vast scope of the AI risk and safety landscape, the ZiF workshop could naturally only take a first step in navigating it. Nonetheless, bridges have been built and signs of agreement are emerging: both from a tangible risk perspective as well as as a catastrophic risk perspective, further massive up-scaling of AI models appears inadvisable. Instead, research into smaller, special-purpose models in well-defined socio-technical circumstances may be the way forward – at least, as long as convincing safety approaches, from technical to social, are still being hashed out.