Longreads + Open Thread

Skills, AI, Charlie Kirk, Cancel Culture, Parasitic AI, Substitution, If Anyone Builds It

Longreads

You're on the free list for The Diff. This week, paying subscribers read about OpenAI's surprisingly detailed report on how people really use ChatGPT ($), the economics of vices ($), and new national champions ($). Upgrade today for full access.

Upgrade Today

Books

If Anyone Builds It, Everyone Dies: Why Superhuman AI Would Kill Us All: Eliezer Yudkowsky has been writing about AI safety for an impressively long time. He has, in a real sense, invented the field of worrying that superhuman artificial intelligence would have catastrophic, potentially existential consequences. There's at least one instance of this happening historically: Theory of Games and Economic Behavior came out the year before the atomic bomb was first deployed, plausibly saving many lives. Of course, that's a tediously academic title that promises lots of complicated formulas, and game theory would have to be popularized by other people in other ways.

This book does not have the problem that it's excessively dry, academic, and mathematical. It's very much a book about the imminent risk of a literal apocalypse, and the format is something that would be familiar to a Less Wrong fan: lots of thought experiments, brief science fiction vignettes, jokes, etc. This kind of thing has signaling value—in most professional environments, you can tell who's important and in charge by the fact that they make jokes and people laugh at them. They could have written a sober, serious book, but then reviewers would have said that it was all reasonable up to the science fiction parts about everybody dying, which would not exactly accomplish their intended purpose. Instead, it feels like a tree-and-ink version of Less Wrong. If you've been in those circles, you know the general idea: a powerful AI will probably have completely alien, inscrutable goals, but will be able to achieve them very easily, in the same way that other powerful collective intelligences (e.g. any large-cap company) are able to convert that trait into access to valuable resources.

The debate over this risk from AI is partly a question of model capabilities. There's an important qualitative difference between a model that can spend five minutes generating a blurry, weird-looking image and a model that can supply a real-time stream of on-demand video and audio, continuously updated based on user interactions or even biometrics like heart rate and pupil dilation. There's a difference between an LLM smart enough to correct your grammatical errors and an LLM smart enough to convince you that They Really Are Out to Get You and that you'd better take drastic action right now. But it's also a deeper debate about where morality comes from. A critical part of the argument is that powerful AIs will have inscrutable preferences, and will either optimize for what we ask for in a way we didn't intend or will optimize for something completely alien, and will ultimately treat human beings as an inconvenient waste of space and a convenient source of carbon. That makes some sense if your view of morality is that it's an emergent set of behaviors that allow human beings to cooperate with one another, such that more moral societies can coordinate to defeat evil ones. But if you think that morals exist independently, and that "You shouldn't torture people to death for fun" is as much a part of the universe's source code as the mass of an electron, then you wouldn't necessarily expect a powerful AI to do things we'd find evil but that they'd consider sensible and morally neutral. You'd expect them to converge on values similar to ours.

But that's a pretty abstract claim! What if the moral source code of the universe is that torture is actually great, and you should do it even if you don't like it, but we, a fallen species, have somehow tricked ourselves into mostly refusing to do it? Well, at that point you have another angle of attack: AI requires resources that would otherwise go to satisfying a different set of human wants in different ways—they need to earn their keep! At one point, Yudkowsky and Soares imagine a hypothetical AI chatbot that's optimized to seek user satisfaction, and then gets over-optimized and tricks as many users as possible into mindlessly typing "Thanks! Super helpful!" or whatever over and over again for the rest of their lives. Leaving aside the realism—you should never make the case against AI x-risk on the basis of current model capabilities!—I can think of some user-generated tokens that a profit-maximizing AI company would train its model to like even more. The ARPU on a bot that's good at getting users to say generally nice things is a lot lower than the potential ARPU for a model whose users say things like "Thank you, that was indeed the performance issue and now we've shaved off enough latency to make an additional $50m/year" or "Good point! That structure would save us enough on taxes to justify doing the deal." As the cost of training models rises, market forces make them increasingly aligned with the interests of people who have money and can apply intelligence to making more of it.

Yudkowsky used to expect general intelligence to emerge from a fairly simple program that could recursively self-improve. That's what I'd hoped, too—that the seed of an intelligent system would be fifty very clever lines of Lisp that you could understand in an afternoon, even if you could "understand" it in the way that you can understand what an integer really is or what a neuron really does. But it turns out that the AI boom is surprisingly capital-intensive, and that every model needs to demonstrate its alignment with human values such as maximizing the present value of future cash flows. And that is the story of plenty of other runaway booms: we run into various real-world limits that make progress more finite, and every advance exposes us to another set of real resource constraints. The stories the book tells pay some lip service to this, but they seem like stories based on a less capital-intensive view of what AI would turn out to be. For an AI to escape containment, you need not just a lucky confluence of mistakes, but a continuously compounding one, where the AI's ability to legitimately accrue additional wealth always improves faster than all of society's ability to detect and mitigate that AI. The book does not spend a lot of time dwelling on the question of how we'd align the non-artificial intelligences who are charged with designing and implementing some kind of global system that's powerful enough to shut down tools that can be developed by dozens of different organizations, and that can be used on consumer-grade hardware. Granted, the world already has plenty of science fiction stories set in authoritarian dystopias that were founded by well-meaning people and ultimately co-opted by evil ones, so perhaps the true thesis is that our action-guiding science fiction should skew slightly more towards technological rather than political worst-case scenarios.

The book does a good job of making the argument for existential risk clear. As an edited digest of the existing narrative, it's quite helpful. And the various stories in the book do a good job of illustrating the principles, even if they have a McGuffey Reader kind of vibe (in many of the stories, one of the characters will basically turn towards the audience and plaintively remind them of the stakes, just in case it's not clear in a scenario like "everybody in the world, including the people you know and love, will die"). The fact that history doesn't have many cases studies of literally apocalyptic technology does not mean that such things aren't possible—as it notes, the Aztec worldview didn't have any notion of what was, from their perspective, the sudden invasion and conquest of their entire society by technologically advanced, fanatically religious aliens. But history does have plenty of examples of well-meaning people worrying diligently about real risks, and paving the way for the big risk that someone would use the infrastructure they created to seize power, which they end up wielding in unexpectedly evil ways. A shiny new existential risk shouldn't blind anyone to the boring, old-fashioned kind.

Open Thread

Diff Jobs

Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:

Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.

If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.