In this issue:
- What We'll Lose With AI—Any labor-saving technology replaces the cost of labor, but also the indirect upside of having performed that labor. And it often turns out that this kind of tradeoff, while ultimately worth it, involved losing some load-bearing experiences.
- Platforms and Reputation—Sometimes the math flips from a positive ROI to working with a company to a negative ROI from avoiding them.
- General-Purpose Technologies—One test of a new communications tool is how well it can be applied to nefarious behaviors.
- Court Politics—The ROI on a third party needs to be high when the investment in it is implicitly in the billions.
- Post-Ozempic Snacking—Fewer calories, same basic model.
- The Next Phase of Crypto Treasury Strategies—Does it still work if it's even more internally inconsistent?
What We'll Lose With AI
You can do some exciting futurism by graphing improvements in model performance against the log of some input, but there are necessary limits to extrapolation. Gutenberg could probably imagine a future where anyone can buy a copy of the bible in their vernacular language, but probably couldn't have imagined how many of them would die over differing interpretations of it. Henry Ford could imagine middle-class families driving to a park, but probably couldn't have imagined the way cities would be reconstructed, or the emergence of implicitly car-oriented business like fast food chains and supermarkets. The only really glaring exception is Gordon Moore predicting, in 1965, that we'd eventually have home computers (but that they'd be networked), automatic controls for automobiles, and personal portable communications equipment.[1] But even in that case, it was hard to predict just how much value would be captured by software, and how much would happen as a result of software companies subsidizing their complements.
But you can get plenty of economically-aware science fiction potential out of AI even if you assume that every lab hits a wall with model capabilities, and that what we're left with is figuring out all the potential of the current set of tools, noting that even if capabilities get worse, pricing probably gets better over time. (Depending on how you do the math, inflation in LLM land is running at about -90% annualized, i.e. at a given performance threshold every year you can buy about 10x as many tokens.[2])
The deployment of any new technology, particularly a labor-saving one, ends up revealing weird dependencies between the labor itself and other outcomes. The US passed 50% urbanization roughly a century ago, and well past that point many of the jobs in cities were in manufacturing; working around the house was also physically demanding when doing laundry took a full day of scrubbing, and when "chicken" was something you'd physically kill and pluck rather than something you could have delivered to your door, in convenient nugget form. For most of human history, "calories-in/calories-out" was not a model that governed physical appearance but was a hard constraint on how many people could exist in a given location before some of them were condemned to starvation. The current setup is, obviously, much better in many respects, but has also led to a situation where companies can achieve eleven-figure valuations through simulated manual labor in the form of gyms and home exercise equipment. And those numbers are dwarfed by the value of selling an injection that just reduces people's desire for food.
The current system is, obviously, much better than the previous one. It's a lot easier to slightly adjust your lifestyle and lose a few pounds than it is to regrow the limb you lost in a farming or industrial accident; getting high levels of calorie output is pretty cheap, and you actually get paid to reduce your calorie input. But it turned out that plenty of social norms and biological set-points came from a world of scarce calories and constant physical labor, and we're still adjusting.
Schools already have some experience with minor examples of this: kids have less of a need to write things out by hand, but they develop better fine motor control by learning cursive even if they'll almost certainly never use it as adults. And at this point an analog clock is a decorative statement rather than a useful tool, but it's still helpful to be able to understand what "eleven o'clock" means as an indicator of what direction to look in.[3] Calculators are an interesting one, because while quick mental math often comes in handy, that's most true in adversarial cases. For the vast majority of people, cheap pocket calculators and then ubiquitous smartphones mean that it is, in fact, a worse use of time than it used to be to get really good at multiplying pairs of three-digit numbers in your head, or quickly guesstimating cube roots.
And here we run into consequences! In a pre-calculator era, or even an era where people were starting to use mechanical calculators to automate some parts of a calculation, there was implicitly human review at every step, and that meant someone looking at intermediate answers and potentially noticing if they looked a little bit off. If you're automating that same calculation, you might miss the fact that you'd transposed digits somewhere, or accidentally added or dropped one. There would, of course, be fewer arithmetic errors, but in some domains the extra checking from humans looking at every step probably produces more bits of correctness than human fallibility at math adds. A pre-calculator era implicitly ran every program through a debugger, albeit a fallible one.[4]
We're very early in dealing with the way LLMs affect the effort/output equation when it comes to text. Obviously, a tailored cold email doesn't have the impact it did a few years ago, since LLMs can produce pretty good ones. But that effect comes from salespeople who are either improving a process of spraying out lots of formulaic emails or who are speeding up the process of writing a more crafted one. What happens when more of the workforce has always had adequate prose available on tap?
I think about this a lot right now because I'm a parent, my kids are mostly home from summer vacation, and one of them, our nine-year-old, is spending a lot of time interacting with ChatGPT. As a result of this, my wife and I have received a fair number of lengthy emails on the topics of babysitters, sibling squabbles, whether or not it's appropriate to stay up until midnight on Independence Day, etc. Many of these messages use em-dashes about as prolifically as The Diff (which has been doing so for some time), and they often note that things aren't just X—they're actually Y. In terms of structure, vocabulary, etc. they're about what you'd expect from an above-average college graduate, but pretty far beyond what a typical elementary school student would produce.
But that college graduate got to the point where they could produce a well-structured argument in a totally different way. It was some combination of underlying verbal ability, accumulated reading, and conversation. And that means the ability to craft good prose was proof-of-work, but also that this ability was bundled with and generally a result of the experiences that came along with it. If you've read Jane Austen, you've learned something about how to put together a sentence, and you've also, by necessity, learned something about human behavior—if Emma took the emotional bullet for you by sneering at Miss Bates, you're a bit less likely to treat the Miss Bateses in your life that way. If those verbal skills come with one-on-one verbal jostling, you've faced a different incentive structure that still pushes in the same direction, of building the best possible mental model of other people's minds. Bodybuilders who use steroids incautiously have an analogous problem, where their muscles get stronger faster than their tendons do, and they end up being capable of pushing enormous weights but not capable of holding them. Having this verbally—being able to produce an argument that's at or above the capability of most of the authority figures in your life—has all sorts of weird consequences. For one thing, it puts a premium on in-person charisma as opposed to writing skill, since that writing skill is now available online for free. It also means that, all else being equal, authority figures should be less willing to bend the rules when there seems to be a good reason to do so, because the process of finding and articulating those reasons has gotten dramatically cheaper.[5]
This proof-of-work shows up in another way: a particular tactic you can use to make readers or listeners feel good about themselves is to tell a story where you share enough details that someone who's plugged-in knows exactly who you're talking about and everyone else doesn't, or to make a literary allusion that some people will get, others will realize they're missing, and others won't pick up on at all—this one creates a whole hierarchy, where the person who sees that there's a reference gets to feel superior to someone who misses it but not as good as someone who gets it right away. Obscure-but-detectable references are fan service for people whose fandom includes real life. It's easier to create these with LLMs, but also harder to hide them. This tweet is a great example: Dylan Matthews wants to refer to a book generally enough that he's not trashing a specific author, but also to indicate that it was taken seriously by critics. o3 is happy to take the criteria he named, enumerate every book that qualifies, then check his Goodreads and see that he's actually reading it. A diligent cyberstalker could have accomplished this kind of thing earlier, and Encyclopedia Dramatica and Kiwifarms are both testaments to the fact that some people are motivated to do exactly this. But when the cost is lower, it happens more often, and at some point the marginal cost actually flips social norms around. A few decades ago it would have been extremely weird and concerning behavior to hire PIs to track one's romantic partner everywhere they went, but this is either normal or in the process of getting normalized because of iOS's location-sharing feature.[6]
So what we'll miss is, ultimately, very similar to what we lost when service jobs took over and someone could do a hard day's work without breaking a sweat or risking being maimed. We lose an assumed connection between effort and output. And how could we not—that's just another way of saying that LLMs are a technology that saves us labor costs, which is another way of saying that they're a technology with any use whatsoever. But, as you're enjoying the fruits of this labor-saving, it's worthwhile to ask what purposes that labor had other than producing its output. Because even if they're small, they'll matter in the aggregate, and we'll miss them, and need to adjust for them, when they're gone.
Larry Page was also extremely prescient about what the ultimate version of Google would look like in this 2000 interview, an artificial intelligence system that “understand(s) everything on the web”, can “answer any question”, and can “understand exactly what you wanted” rhymes quite a bit with the LLM mediated world we live in today. That system was indeed trained by understanding everything on the web, and is getting closer and closer to answering any question with the exact specifications the end user is looking for. ↩︎
As with any inflation metric targeting a rapidly-changing product, you'll get measurement error no matter what metric you choose and for a long time series you'll have to chain together a bunch of different products that aren't directly comparable. Reasoning models by definition produce many more tokens, so in order to get the “same” output (an answer, albeit generally a better one), you are inherently consuming more tokens, potentially even 10x as many. You can't buy a Model T today, but it's interesting to know how much more expensive a Tesla Model 3 is, at at some point you have to put a value on things like a built-in entertainment system, the fact that you don't have to hand-crank a Tesla to get it started, or the fact that the Model T was not configured to produce fart sounds. You can chain together comparisons by looking at the price premium new cars with extra features had compared to prior models that were still being sold new at the same time, but you also have to be careful as some features—"this one comes with a weird strap that will keep you from flying straight through the windshield in the event of an accident!"—become standard. In AI, this is arguably harder. The free models offered by the big labs are generally higher-quality, but cheaper to run, than the ones OpenAI was paywalling at the time that ChatGPT became the fastest-growing consumer product in history. ↩︎
At least, when I was in early elementary school in the 90s, this was the explanation a teacher gave me for why we were going to take a quiz on reading analog clocks. For the record, while this is something I'm capable of doing, I would feel pretty goofy telling someone to watch their six. ↩︎
You can build systems around this exact problem, by producing warnings for implausible intermediate calculations. If you have a pricing tool that estimates how much it'll cost to redo the floors in someone's house, for example, you might throw in a little warning if it turns out that the house has higher square footage than most major cities or if the pricing of flooring is suspiciously low or high. But it's incredibly hard to set these thresholds correctly—every time I wire money or make a large trade in an illiquid stock I cheerfully click past a bunch of warnings that This Could Turn Out To Be a Very Poor Decision, but since these are on by default I've learned to ignore them. ↩︎
Think of how your behavior could change if "I'll be there in ten" were treated as a legal contract and you could get sued for being stuck too long at a red light. it wouldn't change how you interact with friends at all, but with strangers you'd suddenly be a lot more cautious and legalistic. ↩︎
In a way, this is a throwback to pre-urban social norms, where you would generally be able to locate someone by asking around, there being only so many places they could be and few people who wouldn't know them. ↩︎
You're on the free list for The Diff. Last week, paying subscribers got two S-1 teardowns: McGraw-Hill's mostly successful effort to turn a legacy textbook publishing company into a subscription data service ($), and Figma's incredible numbers ($), plus a nice trick for capturing the upside of data gravity without the usual efforts that requires.
Diff Jobs
Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:
- Ex-Optiver/DRW quants with over a decade of experience in HFT and AI are reimagining time series forecasting from first principles. They are building a research lab, initially monetized via derivatives trading. The team is hiring a founding engineer (Python/C++/Rust; distributed compute, ML infra) and a founding AI researcher to rethink how machines model the future. No finance experience needed. (SF)
- Ex-Citadel/D.E. Shaw team building AI-native infrastructure that turns lots of insurance data—structured and unstructured—into decision-grade plumbing that helps casualty risk and insurance liabilities move is looking for a data scientist with classical and generative ML experience. (NYC, Boston)
- An OpenAI backed startup that’s applying advanced reasoning techniques to reinvent investment analysis from first principles and build the IDE for financial research is looking for software engineers and a fundamental analyst. Experience at a Tiger Cub a plus. (NYC)
- A Google Ventures-backed startup founded by SpaceX engineers that’s building data infrastructure and tooling for hardware companies is looking for a product manager with 3+ years experience building product at high-growth enterprise SaaS businesses. Technical background preferred. (LA, Hybrid)
- Deerfield-backed, Series A company building agents for healthcare administration (prior authorization, eligibility checks, patient scheduling) is looking for a senior software/AI engineer to build backend services and LLM agents. Experience building and monitoring production-quality ML and AI systems preferred. (NYC, Hybrid)
Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.
If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.
Elsewhere
Platforms and Reputation
One of the best business situations to be involved in is one where there's upside to being a customer, and every incremental customer creates downside to not being one. At some point, refusal to use social media is socially isolating, and insistence on not using AI means some combination of spending more of one's life than necessary on boring Fermi estimates or just skipping the estimates entirely. This shows up in an interesting way for Amazon: AG1, formerly Athletic Greens, listed its products on Amazon in part because resellers were either offering discounted products or counterfeit ones ($, The Information). A site like Amazon is implicitly close to a public utility—for most discretionary spending, looking up a product on Amazon is pretty close to asking whether or not it can be purchased online at all. The more it becomes a default—i.e. the more big brands use the site—the stronger this presumption is. And that means that the merchants who are reluctant to move onto Amazon are suffering chronic damage to their brand, and have exactly one way to align Amazon's incentives with their own.
If this were deliberately engineered, it would look pretty anti-competitive, but it's an emergent property of having fairly high fixed costs and relatively lower marginal ones, and an economic incentive to build brand equity. There is some consumer surplus to be had in bundling product discovery, search for known products, and delivery into the same business, but the magnitude of those gains means that it's expensive not to opt in.
Disclosure: long AMZN.
General-Purpose Technologies
Colombia has captured an unmanned submarine that used Starlink to send and receive data. A tool designed to decentralize communications and make it harder for governments to interfere will naturally be used by some people who are basically recreationally paranoid about infringements on their rights, and also by people whose rights very much need infringing. A story like this is always ambiguous, because aggressive privacy-seeking behavior is visible to both signals intelligence types in governments and to the providers of such technology. So it could be a story about an Elon Musk-founded company blundering into a negative-net-utility use case—but could also be a case of a company that does a lot of business with the government having an informal policy of mentioning to relevant governments when there's someone in the middle of nowhere who needs data access for who-knows-what-reason.
Court Politics
A new third party is not necessarily a bad idea in the US, but it's important to know what it's for. The effect of such a party is usually to take one political coalition, chip away at some of its interest groups, and force it to move in the direction of those interest groups if it's going to have a chance electorally. So in New York, the Working Families Party typically endorses the Democratic nominee for higher offices but sometimes runs its own candidates locally; the Conservatives actually pulled the mayoral election leftward in 1965, when they nominated William F. Buckley (when asked what the first thing he'd do after winning was, Buckley said "Demand a recount"). The net effect of these parties is that they're a form of light political extortion, sometimes forcing mainstream parties to choose between losing because they went too partisan or losing because someone else peeled off a little bit of their base.
The immediate effect that Elon Musk's proposed "America Party" has really annoyed Donald Trump, who is actually in a position to do something about this by reducing Musk's net worth. The market is already factoring this possibility in, with Tesla shares down about 7% this morning. In a way, this makes the party the best-funded political organization in the US: Musk has personally put about $9bn of his own money into it in the form of mark-to-market losses, and presumably expects some kind of return on investment.
Post-Ozempic Snacking
Snack food brands are selling smaller packages of existing snack foods ($, WSJ), partly because it's one more avenue for finding the exact match between consumer impulse behavior and contribution margin and partly in recognition of the fact that some of their customers are simply going to consume fewer calories per day than they used to. Snack foods are an arms race where, over time, manufacturers optimize every tiny feature of the appearance, taste, and texture of their products, but they've optimized them well for what's not quite the standard palate.
The Next Phase of Crypto Treasury Strategies
The crypto treasury strategy is starting to show its age ($, Diff), but it's still such an easy way to potentially engineer a stock price pop that companies continue to try it out. Sometimes, there's a twist: one tiny biotech company is buying crypto and issuing more stock, just like all the rest, but plans to issue that stock in order to fund its core business rather than financing a pivot to being a closed-end fund that's less liquid and less tax-efficient than competing ETFs. Matt Levine sometimes talks about a model of retail investors where they know that some of the trades they make are pump-and-dump trades, but they're in the market for some combination of the thrill of a gamble and the expectation that they'll cynically sell to some future true believer. But it's one thing for that to be an emergent market behavior, and another for it to be part of a company's approach to corporate finance: this plan is basically saying that the stock price will be artificially high, so capital costs for the underlying business will be artificially low, which only makes sense if the company expects the crypto treasury strategy to underperform their core business over time—in which case the only justification for using that strategy in the first place is that the marginal buyer of the stock won't understand this.