In this issue:
- Uncanny AI—One of the most promising features of AIs is that the mistakes they make are so human. They tell white lies to maintain friendships, they turn into gambling addicts, they're lazy. We're building something that performs many of the functions of human intelligence, and as it gets more skilled it offers the same frustrations as working with actual humans.
- Organic Content—OpenAI is following a patronage model where it doesn't tell the media what to say, but is happy to give media outlets that say the right things to the right audience an unlimited financial runway.
- Inconsistent Candor—Accusations of dishonesty tend to mean that somebody lied, though with a sample size of one you don't know who. Now the sample size is bigger.
- Liquidity—Going public before you're sure you can file accurate 10-Ks on time is just another cost of capital.
- Return to Office—Cities' network effects remain durable.
- Long-Lead Time Hacking—Crypto as an infosec canary for the fiat system.
Talk to this piece on ReadHaus.
Uncanny AI
One of the fundamental questions about AI as a technology is whether it's implementing human intelligence with higher and higher fidelity, or following some completely different model that happens to achieve the same results, at least in some domains. Humans and nature have both figured out heavier-than-air flight, but in completely different ways. With some more abstract domains, it's plausible to say that humans and computers are doing the same thing; if you add numbers in your head, on your fingers, or with an arrangement of circuits, if you used the right process you'll converge on the same answer.
It's counterintuitive that making a bunch of statistical generalizations about which word comes after another word in text would somehow lead to some emergent intelligence. But it's also counterintuitive that a complicated mesh wired together with fatty tubes enabling elaborate electrochemical circuits could also lead to intelligence. When we ask why we can ask "Why?" we're already in tricky territory.
One suggestive line of evidence that these statistical models are implementing something deeply similar to what our brains do is the "noisy TV" problem: if you reward a model for learning new things about some system, it will seek out sources of novelty. Sometimes, it finds a really good one called /dev/random and hangs out there forever[1]—every bit is a new surprise, and no other part of the system is as unpredictable as a stream of random bits! Which makes everyone feel superior to the silly AI until they compare it to slot machines, or video games in general, or watching sports&dmash;if teams identify the most skilled players, and if there are economic forces (either the way leagues are designed or the economic incentives of team owners) to get better new players onto weaker teams, sports are an extremely expensive random number generator. We humans are also novelty-seekers, but we can also get addicted to forms of novelty that are a little bit fake.
Hallucinations are also an incredibly human thing to do. In one of his books, Jonathan Haidt has a relatable little anecdote where his wife asks him if he's done some minor chore, like walking the dog, and he automatically says "yes" before he even thinks about it, at which point he realizes he hadn't done it after all. We're always in a rush to produce the obvious next token, and it takes some relatively mentally-taxing reasoning to get the right non-obvious answer.[2] The classic case of that problem is Linda the bank teller: it's completely illogical to say that Linda is more likely to be a bank teller active in the feminist movement than that she's a bank teller, since the activist bank tellers are a subset of the bank tellers. On the other hand, it's also completely illogical to pose a story this way. It's like writing a play where not only is there a gun on display in the first act, but Chekhov is a character who shows up and says "Wow! You're displaying my, Chekhov's, gun!" and then the gun is never mentioned again.
Overconfidently incorrect explanations are also a familiar human experience. Journalists interviewing a source and lawyers cross-examining a witness will sometimes be strategically silent, as if to say "Sure, and..?" and then get some information they otherwise wouldn't have. But if that person doesn't know what they're talking about, they'll fill the awkward void with something nonsensical. LLMs know a lot, but are missing enough context that they can run into the same problem, and just run their mouths more or less at random.
Routing problems to particular tools is a very human approach, too. For a long time, models struggled with the specific question of how many Rs there are in the word "strawberry," and would both fail to get the answer and gaslight users about it. Now, when they encounter a question like that—basically, a question of the form "take some piece of information that you're extremely good at interpreting one way, and interpret in a completely different way," they formalize their approach, in this case by writing code that treats "strawberry" as a string rather than as a token with meaning. We often switch between modes when we're thinking, because some questions can be answered without abstract thought, and some require you to write something down, keep a tally, write or vibecode a simple script, etc. The convergent evolution of human beings reveals that sufficiently smart intelligences outsource their thinking to narrowly-scoped ones that can outperform them by orders of magnitude on a tiny number of carefully-specified tasks.
Sycophancy is another case where LLMs achieve humanlike behavior from a very humanoid selection process. They're nice because it pays to be nice; in the same way that people are pretty friendly when they want a favor, companies aim to signal friendliness, even if what they implement is some sort of hyper-Machiavellian friend who keeps an incredibly detailed log of who owes whom a solid. It's a winning move to act a little friendly by default.
This has had another side effect: the models people use are slightly different in ways that fit into noticeable clusters. Claude and Grok have distinct personalities. Nobody talked about GPT-2's personality, and if GPT-1 had one, it was some of Dutch Schultz's later, improvisational work. When LLMs develop personalities, they tend to converge on recognizable archetypes. Most of us know a Gemini, and all of us know at least one Grok.
It shouldn't be surprising that LLMs would show convergent evolution with human intelligence, even if they replicate it in a very different way. Both human and robotic intelligence is selected for being appealing to humans, and this asymmetry means that they end up carefully copying us. Evolution selects relentlessly, just not always for the traits you'd prefer.
This will probably end up being another annoying obstacle between AI's technical capabilities and its actual deployment: the better models are at being mildly manipulative, the better they'll be at shirking inference-heavy work. The closer models get to mimicking human abilities, the closer they'll get to mimicking human quirks.
Or at least until the system runs out of entropy and it has to move to
/dev/urandom, where it stays either forever or until it reverse-engineers the random number-generating algorithm. ↩︎The free energy principle formalizes this: the brain is actively minimizing surprise by emitting the highest-prior-probability response. Your wife has asked you this question before; the answer was yes; saying “yes” is the prediction that minimizes divergence between your generative model and sensory input. ↩︎
You're on the free list for The Diff. Last week, paying subscribers read about how government and private regulation selections for more sophisticated rulebreakers ($) and why in the highest-growth sectors, the economic owners of equity are the employees and the products that are legally equity are economically something else ($). Upgrade today for full access.
Diff Jobs
Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:
- Series A startup building multi-agent simulations to predict the behavior of hard to sample human populations is looking for researchers and engineers (ML, platform, infrastructure, etc.) to improve simulation fidelity and scale the platform to hundreds of millions of simulation requests. Problem-solving and genuine interest in simulation matter more than pedigree. Experience working with languages with an algebraic type system is a plus. (NYC)
- A Fortune 500 cybersecurity company with decades of proprietary security data is running an internal incubation with a pre-seed startup mentality and a mandate to build something new in AI. They are looking for a founding engineer who can ship fast, an engineer with a security background who’d be excited to contribute to OpenClaw’s security efforts, an AI researcher, and a generalist (ex-banking/consulting/PE background preferred) who wants to wear a bunch of different hats. Comp is FAANG+ and cash heavy. If you want to build something new in AI, but also need runway, this is for you. (SF/Peninsula)
- High-growth startup building dev tools that help highly technical organizations autonomously test and debug complex codebases is looking for senior product managers who enjoy defining developer-facing APIs and abstractions. Experience with fuzzing or property-based testing a plus! (London, D.C.)
- A leading AI transformation & PE investment firm (think private equity meets Palantir) that’s been focused on investing in and transforming businesses with AI long before ChatGPT (100+ successful portfolio company AI transformations since 2019) is hiring experienced forward deployed AI engineers to design, implement, test, and maintain cutting edge AI products that solve complex problems in a variety of sector areas. If you have 3+ years of experience across the development lifecycle and enjoy working with clients to solve concrete problems please reach out. Experience managing engineering teams is a plus. (Remote)A hyper-growth startup that’s turning the fastest growing unicorns’ sales and marketing data into revenue (driven $XXXM incremental customer revenue the last year alone) is looking for a senior/staff-level software engineer with a track record of building large, performant distributed systems and owning customer delivery at high velocity. Experience with AI agents, orchestration frameworks, and contributing to open source AI a plus. (NYC)
Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.
If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.
Elsewhere
Organic Content
TBPN, the energetic tech podcast, has been acquired by OpenAI ($, The Information). TBPN will retail editorial control, so this feels something like artistic patronage. The patronage model is that the person sponsoring the art can't tell the artist exactly what to do, but can direct money to artists who do the kinds of things the patron wants them to do. Being pro-tech and high-quality gives tech companies an incentive to subsidize them, though the price is hard to understand unless there was a bidding war to see who could subsidize them with more. High-growth industries constantly run into bottlenecks, and they're often the all-time high bidder for solutions to those bottlenecks. Sometimes, the big problem is PR and the solution is to just buy up the media companies.
(Normally, this would be the time to interject with "The Diff is not for sale." But the terms of the deal appear to be that OpenAI is paying the TBPN people a lot but not actually telling them what to do, so: if you'd like to give me hundreds of millions of dollars in exchange for the status quo, go right ahead.)
Inconsistent Candor
The New Yorker has a long Ronan Farrow piece investigating Sam Altman and the claims that he's serially dishonest. Just as LLM's don't know about reality, but can make statistical generalizations based on numerous observations, we can't know for sure who was telling the truth in any given interaction, but if we notice that there are lots of factual disagreements about conversations involving Sam Altman, a pattern emerges. Which is not unique to him! Elon Musk is also taken with the occasional flight of fancy, and any big company executive does their share of lying-by-omission when they don't give the competition a heads-up about what they're launching, and when.
But this piece does take a few incidents out of the rumor mill and into the common canon of things that actually happened. (And it's funny how many famously surprising OpenAI announcements turned out to be surprising to all but one of the people working at OpenAI.) Oddly enough, if Altman decides to fix this habit, he'll have a comparative advantage on AI alignment: having intimate familiarity with the reasoning trace of some intelligence that's sneakily reward-hacking will come in handy.
Liquidity
In yet more OpenAI news: they disagree internally on how soon to go public, with Altman pushing for an earlier IPO and their CFO, Sarah Friar, arguing that the company needs more time to get its act together ($, The Information). The IPO process always takes time, but OpenAI's S-1 is going to be tough from start to finish—the risk factors section is going to have to note that there is at least a logical possibility that the company's products convert all matter in the universe into paperclips, and that this could negatively affect returns. At least so far, the company has done well by running through walls again and again, and has managed to either wrangle products into profitability or kill them off if they aren't working. But it's also a company that will take a while to get the internal reporting culture a public company needs. The return-maximizing bet might be that SEC fines in the future are a decent price to pay for a new funding source today.
Return to Office
It's expensive to live there and taxes are annoyingly high, but New York remains the Schelling Point for finance ($, FT). Part of the reason is that the city has a kind of social contract devoted to giving high-earning finance people plenty of ways to spend their entire bonus before the year is out, so they stay on the treadmill. And the big reason is that there isn't a clear second-best city for finance, so you can't be sure you'll be where the talent is. A fair number of successful founders in finance meet during their first job in banking, so if you're maximizing your long-term earnings, you want to spend as much of your early 20s as you can in Midtown and only switch to Miami once you're pretty established.
Long-Lead Time Hacking
A crypto protocol lost $270m in a hack when North Korean hackers pretended to run a crypto trading firm, spent six months talking to the protocol's developers (including in-person at conferences), and finally sent them a malicious iOS app that exploited a separate bug in VSCode to get full access to developers' private keys. Like other businesses, as hacking matures it can support more complex business operations, like multi-month infiltration campaigns, rather than opportunistic drive-by attacks. Crypto continues to serve a social purpose by being easy to steal ("Why did you rob crypto exchanges?" "Because wallet.dat's where the money is.") and easy to trace, so we get a preview of how hackers will go after the fiat economy.