HUMAN COMPATIBLE: Artificial Intelligence and the Problem of Control

“What if we succeed?” Specifically, what happens if we, the AI scientists of the world, succeed in building ever-smarter, ever-more-capable machines, eventually rivaling or surpassing the intelligence of humans? Will the world be better off, or worse? That’s the provocative question at the heart of Stuart Russell’s new book, Human Compatible.

Russell is eminently qualified to ask and answer this question. He is coauthor of Artificial Intelligence: A Modern Approach, the preeminent textbook on AI. He is also a distinguished AI researcher and computer scientist at the University of California, Berkeley, where he heads the Center for Human-Compatible AI. Put simply, Russell is one of the world’s foremost AI experts — which makes his book’s thoroughgoing critique of conventional AI especially powerful.

Machine Superintelligence is (Sort of) Here

Russell starts off by taking issue with those who argue that machine superintelligence is unimaginably far off, or even impossible. He makes the case that narrow superintelligence already exists — there are many arenas in which machines are already superior to humans — and says a key thrust of AI research is toward making machines with broader and broader skill sets.

In fact, Russell asserts, these “partially superintelligent” systems will, individually and collectively, begin to pose many of the same issues that a generally intelligent system would. (Parenthetically, he also notes that superintelligence in machines does not imply or require machine consciousness — a subject about which he says he, and all the rest of the scientific community, has essentially zero knowledge.)

Russell concludes that whether more generalized superintelligence in machines proves to be entirely beneficial or, instead, “the last event in human history,” depends on one crucial thing: “retaining absolute power over machines that are more powerful than us.” But that, it turns out, is quite a bit more difficult than most of us would imagine (remember HAL 9000 in the movie 2001: A Space Odyssey).

It also gets to the heart of the problem with AI development as currently pursued. In what Russell refers to as the “standard model” of AI development:

We build optimizing machines, we feed objectives into them, and off they go. That worked well when the machines were stupid and had a limited scope of action; if you put in the wrong objective, you had a good chance of being able to switch off the machine, fix the problem, and try again.

As machines designed according to the standard model become more intelligent, however, and as their scope of action becomes more global, the approach becomes untenable. Such machines will pursue their objective, no matter how wrong it is; they will resist attempts to switch them off; and they will acquire any and all resources that contribute to achieving the objective.

All of which leads Russell to conclude that the standard model for AI development is fundamentally flawed. “It works only if the objective is guaranteed to be complete and correct, or if the machinery can easily be reset. Neither condition will hold as AI becomes increasingly powerful.”

Three Principles for Beneficial Machines

Instead, Russell proposes that AI scientists should seek to maximize machine beneficence, not machine intelligence. Echoing Isaac Asimov’s famous Three Laws of Robotics, Russell proposes Three Principles for Beneficial Machines:

  • The machine’s only objective is to maximize the realization of human preferences.
  • The machine is initially uncertain about what those preferences are.
  • The ultimate source of information about human preferences is human behavior.

The second principle is especially critical:

That the machine is initially uncertain about what human preferences are, is the key to creating beneficial machines. A machine that assumes it knows the true objective perfectly will pursue it single-mindedly. It will never ask whether some course of action is OK, because it already knows it’s an optimal solution for the objective. It will ignore humans jumping up and down screaming, “Stop, you’re going to destroy the world!” because those are just words . . . On the other hand, a machine that is uncertain about the true objective will exhibit a kind of humility: it will, for example, defer to humans and allow itself to be switched off.

If, instead, conventional AI continues along its current course, Russell says we risk a fate akin to King Midas, the legendary ruler in Greek mythology, who got exactly what he asked for — namely, that everything he touched would turn to gold. Too late, he discovered that “everything” means everything. Having inadvertently turned his food, drink, and family members into gold, he died in misery and starvation. In fact, this theme — that humans are insufficiently wise to safely wield superpowers — is ubiquitous in human mythology.

Given that, Russell wants nothing less than to reorient the entire course of AI development:

In a nutshell, I am suggesting that we need to steer AI in a radically new direction if we want to retain control over increasingly intelligent machines. We need to move away from one of the driving ideas of twentieth-century technology: machines that optimize a given objective.

Beyond the clever writing, the enlightening parables, the trenchant analysis, Russell has written a surprisingly radical book. His critique of the current course of AI development is, essentially, that it has been built upon a great hubris — a belief that humans are sufficiently smart, really sufficiently god-like, that we can precisely specify exactly what should be optimized by superintelligent, super-powerful machines. In this, Russell suggests, we are like children playing with matches.

Viewed from the vantage point of my own Christian religious tradition, Russell’s book exposes a deeply flawed premise at the heart of the entire AI enterprise. The progression toward superintelligence tantalizes with humankind’s recurring fever dream: ‘We want to be like God.’ About which Adam and Eve might offer a cautionary rebuke.

X