Strip-Mining the Maybesphere
In this issue:
- Strip-Mining the Maybesphere—Training data for LLMs comes from the Maybesphere, the space of questions that are non-obvious enough to be worth asking, but probably have an answer. This is a narrow subset of the overall space of human knowledge.
- Regulatory Clarity—Everyone in AI is happy if they know that the US government is aware that it's a big deal, but also deferring some big decisions.
- Eurobonds—In the end, nobody's a senior creditor to the electorate.
- Just-In-Time—Entertainment/shopping convergence continues.
- Ratings—The game theory of the credit rating business, as explained by a defector.
- The Crypto Credit Cycle—Leverage raises risk and reward for individual market participants, but it also has this effect on the people who buy but don't borrow to do so.
Strip-Mining the Maybesphere
About half a century ago, Hans Moravec wrote an overview of the state of machine intelligence and some of the constraints that would apply. It's a wild read that jumps across many different domains—on one page, he's citing a Jacques Cousteau video he saw about squid, and a few pages later he's talking about the tradeoff between writing fast code in assembly and writing code fast in Lisp, or about roughly how many PDP-10s you'd need to simulate the human nervous system.[1] He concludes with something that would not be out of place on a Mag7 earnings call: "[W]e should realize our desperate need for more computing, and do things about it."
That essay is also an early exploration of another observation: evolution has a two-billion year head start on creating intelligence, and we'll have to be clever to keep up. But what kind of cleverness? In that essay, he says that we'll have enough computing power to simulate a brain in a decade, and when that decade passed, computers were a lot more impressive, but artificial intelligence wasn't there yet. He observed that computers had gotten good at things humans had started doing fairly late in our evolutionary history, like taking calculus exams, but struggled with things we've been doing a lot longer, like recognizing facial expressions and picking up objects. Some of the things that are quite difficult for most humans to do have a very odd feature of being easy to explain to computers. It’s easy for a programmer to write down the logical steps that explain how to calculate a derivative, but very difficult to do the same for gleaning insight from facial expressions. This was a massive limitation in the deterministic software paradigm. Classical ML/early deep learning allowed us to make computers useful for things that were traditionally hard to explain to them (and sometimes difficult to explain to humans as well: most people would find it difficult to lay out the thought process they use to identify what a dog looks like vs. a cat. They don’t know exactly how they can tell the difference, they just can). Generative AI has the crucial feature of being useful for things that were traditionally difficult to explain to computers, but also relatively easy to explain to an intelligent human being.
Now, we have models that have expanded to have many more human capabilities—writing code, composing limericks, citing completely fabricated scientific research to bolster an argument, politely agreeing with someone who tells them something crazy, etc.—but there's a lot that these models can't do, and a lot that's missing from their world model.
When a computer gives the wrong answer to a question, it can only be because the logic was wrong or the underlying data was. We don't have access to the logic, and while it's possible for teams at different companies to be making the same mistakes at the level of system design, it's implausible that they're all producing the same accidental bugs.
So, data's not the only place to look, but it is the only place where you're likely to find something (unless you, personally, worked at one lab, got poached by another, and just realized that you and your former colleagues had messed something up). And here, we have some good material for thinking about why LLMs will excel in some domains and struggle in others.
In his wonderful essay on operating systems, In the Beginning was the Command Line, Neal Stephenson says:
In your high school geology class you probably were taught that all life on earth exists in a paper-thin shell called the biosphere, which is trapped between thousands of miles of dead rock underfoot, and cold dead radioactive empty space above. Companies that sell OSes exist in a sort of technosphere. Underneath is technology that has already become free. Above is technology that has yet to be developed, or that is too crazy and speculative to be productized just yet. Like the Earth's biosphere, the technosphere is very thin compared to what is above and what is below.
There's a similar thin slice in text: the Maybesphere, a set of all question-answer pairs where the question probably has an answer and is worth asking. There's the infinite space in every direction of questions that don't have an answer because they're ill-formed ("How might I green more fiveishly?") or that combine irrelevance and the impossibility of getting an answer ("What was Tutenkhamen's third-favorite dessert?"). But there's also the bedrock of things so obvious that they just don't come up in conversation.
This doesn't just explain why LLMs fall for reverse-brainteasers, and why they used to get thrown off by trivial math problems. There isn't much training data where there's a story that sets up an obvious conclusion, and then delivers it. There aren't many pages enumerating all the obvious and easy-to-solve math problems, but you can read a lot about why the sum of all positive integers is negative one-twelfth. So if you ask a silly question, the relevant training data will skew towards silly-looking questions that are actually quite deep.
But this skew also explains why most LLMs tend to lean left on social issues. o3's rough estimate is that two thirds of the training tokens come from that source (the Internet), with books, public-domain and not, as the next-biggest chunk, so models have massively more information about things that turned into popular topics of discussion in the recent past. We have a lot more Reddit comments about Obergefell than Usenet posts about DOMA, so people who were paying attention to politics at that time are privileged with far more training tokens per capita than the ones who were equally-but-oppositely opinionated two decades earlier. And this explanation works regardless of your model of the underlying phenomenon:
- If you're a believer in what detractors call the Whig Theory of History, i.e. that we've tended to get more moral over time, then this is a bias in the models that happens to reverse a bias in reality, i.e. it's a helpful fix for the fact that you can read something written in 1950, or for that matter 1550, and risk absorbing all sorts of bad ideas.
- If you have a more tragic view of human history but also want to cope with the fact that we're obviously materially better-off than we used to be, you can argue that scientific progress made us able to afford worse behavior—modern car safety features make drunk driving less fatal than it used to be, modern medicine and birth control means that infidelity is less risky, penny-pinching is a largely recreational activity in a country with a GDP per capita of $80k, etc.—but people still wanted to justify their behaviors, so there's large demand for clever arguments that misbehavior is not so bad after all.
- You might believe that these things are cyclical, and that some concepts of progress get ejected from the narrative in order to make a cleaner narrative (to a Whig historian writing in the US in the late 1920s, all of our moral development from the dawn of humanity had finally culminated in making it illegal to have a glass of wine with dinner).[2] In that case, you have a different model that explains the same output: smarter, more verbal people will tend to figure out which way things are going and rush to write themselves into the front of the parade. So, once again, the more voluminous and interesting tokens are the ones saying that whatever our current notion of moral progress is, it's simply tremendous.
Do LLMs make good, or at least compulsively addictive, therapists? There's a lot of content about people's feelings, interpersonal interactions, etc. And that content will be disproportionately clustered in the ambiguous cases, where every argument can produce infinite counter-arguments. Are LLMs good at writing code? Lots of it gets shared for free, tutorials abound, and people argue about their favorite languages and illustrate their reasoning with code. Are LLMs pretty uneven when it comes to history? It really depends on whether or not it's a historical topic that's been trendy to argue about in the last twenty years or so.[3]
This has started to shape white collar work already. There's a booming business in getting paid to answer questions in your narrow area of expertise, whether that's chemistry or music theory. The returns from taking good notes, and from mandating good note-taking organization-wide, has also gone up. Rough drafts use to be almost completely worthless, except for superfans, but if each draft is an input into the next then a model that's trained on a series of drafts that turned into something great is learning the process, not the finished product—they're being trained to make something good, rather than to make something that has a lot of the superficial signs of being good. This also raises the value of hyperlinks in essays, and citations in academic research, for roughly the same reason.
But there's still the bedrock problem. How do you get a model that's smart enough to understand the set of all things that are both obvious to a normal person but not something that can be deduced from first principles. You get there by digging down, identifying things people choose to write about, looking for what they could have hypothetically written about instead, and paying people to codify it, as tedious as that job is. Over time, something interesting would start to happen: you only have to dig down until you hit first principles, or things that can be directly deduced from them. That's probably why models do so well with code, where everything is built up from first principles, in layers of abstraction that are consciously built on one another. It would be a nice victory for the Platonists if there turned out to be other fields like that—if you could write down the rules for life and somehow deduce the existence of humans. Or if it turned out to be true of every field. It'll be a slog to find out, but the more money that gets spent on compute and on researchers, the higher the returns from incremental money spent on data. Even if it's the most tedious data imaginable—literally the set of true statements so prosaic nobody in history has bothered to type them out before—it takes you to interesting places.
He also casually notes that humans are "probably not the most individually intelligent animal on the planet," citing bigger-brained species like sperm whales, and creatures with separately-evolved and quite alien nervous systems, like giant squid. So if you ever worry you've spent too much time on Twitter, Hacker News, grad school, etc. just remember that at least one smart, prescient person suspected that the two smartest species in the world keep busy mostly by squabbling with one another. ↩︎
These unprecedentedly moral people would be happy to tell you which subgroups in the US were the worst offenders in the pre-Prohibition era, whether that meant not-white, neither Anglo nor Saxon, or not-Protestant. One of the reasons people tend to age out of Whig History is that, while there is recognizable progress over time, the definition of progress tends to jettison and then completely forget about once-progressive ideas that turned out not to work. The coherent story you tell looking back on history would be confusing to the people who made it happen. ↩︎
This is a good reason that, even if LLMs end up replacing a lot of knowledge work, it's good to know things and in particular good to have exceptionally deep knowledge in some domain. How good you are at the thing you're best at determines how powerful an LLM you can reasonably evaluate. If Grok is twice as good at physics as I am, but OpenAI is three times as good, I'll have no way of knowing. But if there's some topic, however obscure, that I know a bit better than the best LLM, there's something to quiz them on. ↩︎
You're on the free list for The Diff. Last week, paying subscribers read about how temporarily-acceptable white-collar fraud will affect the 2028 campaign ($), the paradox that complex supply chains with many intermediaries have high consumer surplus ($), and why capturing low-hanging fruit produces an enduring strategic advantage ($). Upgrade today for full access!
Diff Jobs
Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:
- Well funded, Ex-Stripe founders are building the agentic back-office automation platform that turns business processes into self-directed, self-improving workflows which know when to ask humans for input. They are initially focused on making ERP workflows (invoice management, accounting, financial close, etc.) in the enterprise more accurate/complete and are looking for FDEs and Platform Engineers. If you enjoy working with the C-suite at some of the largest enterprises to drive operational efficiency with AI and have 3+ YOE as a SWE, this is for you. (Remote)
- A blockchain company that’s building solutions at the limits of distributed systems and driving 10x performance improvements over other widely adopted L1s is looking for an entrepreneur in residence to spearhead (prototype, launch, grow) application layer projects on their hyper-performant L1 blockchain. Expertise in React/React Native required. Experience as a builder/founder with 5–10 years in consumer tech, gaming, fintech, or crypto preferred. (SF)
- A leading AI transformation & PE investment firm (think private equity meets Palantir) that’s been focused on investing in and transforming businesses with AI long before ChatGPT (100+ successful portfolio company AI transformations since 2019) is hiring Associates, VPs, and Principals to lead AI transformations at portfolio companies starting from investment underwriting through AI deployment. If you’re a generalist with a technical degree (e.g., CS/EE/Engineering/Math) or comparable experience and deal/client-facing experience in top-tier consulting, product management, PE, IB, etc. this is for you. (Remote)
- A company that was using ML/AI to improve software development/systems engineering before it was cool—and is now inflecting fast—is looking for a product marketing manager to articulate their value proposition and drive developer adoption. If you started your career in backend engineering or technical product management, but have since transitioned (or want to transition) into a product marketing seat, this is for you. (Washington DC area)
- A Series B startup building regulatory AI agents to help automate compliance for companies in highly regulated industries is looking for legal engineers with financial regulatory experience (SEC, FINRA marketing review, Reg Z, UDAAP). JD required; top law firm experience preferred. (NYC)
Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.
If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.
Elsewhere
Regulatory Clarity
This Zvi Moshowitz piece is probably the only thing you'd need to read on the US's AI Action Plan. It's surprising that even in the fractious world of AI, this got approval from both accelerationists and people worried about the existential risk from AI, which is hard to pull off. (Of course, adjacent to each of these groups are critics who consider them sellouts for being too moderate.) But one reason they can both agree on it is that it makes AI less political, which means both sides can get more of what they want, which reduces the odds of high-stakes rulemaking in either direction. Part of the AI safety argument is that it's so hard to design good rules to constrain AI, and the more that rule-making process is boring and technocratic rather than the subject of median-voter-swaying rhetoric, the saner the rules will be.
Eurobonds
Dealbook recently had an interview on the subject of "Eurobonds," or joint EU sovereign debt. That was a follow-up to this piece proposing them in some detail. The argument for them is that the world needs a new reserve asset, or at least an alternative one—and probably also that European countries are going to be spending more on defense in the near future and can achieve lower borrowing costs if they club together.
But it's hard for this to be a good substitute for treasury bonds. Every country's financial system will be more dependent on their sovereign debt than on EU-wide debt, at least for a while, so the former is a more systemic liability to national leaders. And in any country that goes through a recession with accompanying austerity, these will be incredibly unpopular—the only thing worse than cutting government services and raising taxes in order to pay money to German lenders is to do the same spending cuts and tax increases in order to fund interest payments on debt that's also the obligation of German borrowers. So they make the EU more monetarily important, at the expense of making it politically flimsier.
Just-In-Time
About a month ago, The Diff speculated that the combination of short video apps, food delivery apps, ghost kitchens would radically accelerate the pace at which food trends spread ($, The Diff), which turned out to be wrong in two respects: wrong tense, and wrong apps. As it turns out, the actual concept is to make a TikTok-style app that's just videos of food that can be ordered directly from the app, and then to market it on TikTok. This is actually a huge improvement, because it saves the user a few taps, and because TikTok is such a natural platform to market it on. TikTok's audience demonstrates a clear affinity for short form video, and TikTok doesn't have to worry that they're losing customers, since this is an app that it only really makes sense to use three times a day.
Ratings
Ratings agencies are a strange business. They basically rent out their reputation to borrowers, and while they aren't on the hook financially unless they really mess up, they lose reputational capital every time someone they said was reliable turns out to default. In an unregulated market, this creates two strategies: either having strict ratings, and charge a premium, or have a sloppy process and hope that nobody remembers who you are. One tiny insurance rating agency, Demotech, managed to use the latter approach—they keep giving insurers one of their highest ratings until just before default ($, WSJ). This is a case where regulation runs into a paradox: if there's a standard for the quality of ratings (or of the analysis behind them), and if consumers trust that quality, the optimal short-term move for raters is to adhere to the loosest standard that technically meets the rules. This eventually reaches an equilibrium, as it's hard to imagine that Demotech will be allowed to keep operating the same way. But in the meantime, policyholders are still waiting for their money. You can't eliminate irresponsible behavior in insurance, and the real mandate for insurance regulators is to make such behavior narrow in scope and, ideally, too boring and slow-moving to constitute a real opportunity.
The Crypto Credit Cycle
Back in 2021, a few friends started telling me about the high yields they were getting in crypto, which was a surprise to me because I couldn't figure out where that yield could come from. It turned out that there were a few ways to do this: automated market-making, margin lending, and ponzi schemes. Of these, margin lending against an already-volatile asset is not a great business but at least has a transparent failure mode, and after widespread losses after that previous credit expansion, the lenders are back ($, FT). This makes crypto more of a momentum asset, which is an interesting feature given that part of the pitch for crypto is that it's a better reserve asset than gold. The way to make it technically better is to design gold-but-digital, but the only way it can take share is for relative market caps to imply that a given token is a better store of value than gold. So raising the odds of that does make crypto worth more, on average, but also makes it more volatile. And it's hard for those two factors to stay perfectly balanced for long.