Search Remains the Best Business in History

Plus! Diff Jobs; Equity Market Vigilantes; Scale; Volatility; Tariff Impact; Splits and Synergies

In this issue:

audio-thumbnail
The Diff June 9th 2025
0:00
/856.607347

Search Remains the Best Business in History

The status of "best possible business" is a moving target. "Put a stamp of approval on structured products that were basically designed around getting your stamp of approval" made Moody's and S&P lots of money in the 2000s, with the added benefit of continuously reinforcing their brand name. As it turned out, associating their brand name with these products eventually backfired. Fannie and Freddie also had a very nice business back when the implicit government guarantee was a hypothetical and the resulting cost of capital was very real. Transforming assorted commodity food products, like corn syrup, food coloring, and caffeine, into Coca-Cola, the universal drink, was a contender for a long time, but hasn't produced lolapalooza results in a while. Some great companies are products of circumstance, like Chinese banks in the late 90s: run-proof because of government backing, with a low ceiling on what deposits earned, strict restrictions on other investment opportunities, and plenty of high-return opportunities to back infrastructure. Now some of those same banks are the shock absorbers of choice for the Chinese government.

If you're trying to find an enduringly good business, you want one that tends to get better over time. There are plenty of ways that can happen: you could high fixed costs and low marginal costs, such that with enough scale you can undercut any competitor; you might have compounding data (and marketplace) advantages, where usage of your product makes it better for the next user; and it would be nice if you had some kind of scale-driven form of price discrimination, where paying customers are encouraged to pay roughly what they think the product is worth. Best of all would be if this business were simultaneously positioned to take advantage of some secular economic or technological trend, and were also able to accelerate some of those trends. As a bonus, it would be great if this business led to good distribution for ancillary businesses, maybe creating opportunities to commoditize various complements and use them to accelerate the core business.

General-purpose web search has all of this. To build a search engine, the first thing you need to do is grab the results, i.e. download a copy of the entire Internet, so you know what the content is at every URL and can thus start to think about ranking them. But wait! That's a somewhat sloppy description of crawling: some kinds of content aren't searchable, or at least weren't very searchable when the search business got started: you don't need a copy of every video or audio file, nor do you necessarily start with every image. But "every" is actually harder to define even if we restrict ourselves to text: suppose someone created a page called IsItPrime.com, whose premise is that if you visit a URL like IsItPrime.com/2/, you get the answer "yes!" plus a "previous number" and "next number" button. I cache a bunch of these results, but if someone hits a new page I'll quickly check whether or not it's prime.[1] If there happens to be just one instance of whatever crawler is being used by a search engine, and that crawler insists on exhausting a site before visiting the next one, it'll just keep hitting "next" until the end of time or until the prime number site conks out. There's also plenty of redundant content, but you need a taxonomy of this—if public-facing user profiles on some site are identical except for the username, it's still useful to show them in search results when someone searches for that username; even if thousands of sites have a text like the Declaration of Independence, there will be different sidebar links, which users may be able to infer from domain, so they all need to be indexed, too.

And then there's the question of how often to do this. Getting a one-time snapshot is one thing, but figuring out the cadence with which to visit different pages—or to check whether a temporarily-unavailable one is back again—is nontrivial.[2] Here's an early case where positive feedback loops from data start to kick in: your users will start to tell you that news pages need to be updated frequently when they do searches for a politician's name plus a term like "washington post" or "nyt." They'll tell you about new products from existing companies when lots of those users suddenly search for the product in question and don't find any good results. They'll share that message especially clearly when they engage in a classic yellow-flag search behavior: search for something specific, click nothing; search for the more generic part of the query (just "nyt," for example), then click on the result without coming back to attempt the search again. This implies that the user decided that the best search tool available was to use their own eyes on the site in question, implying that they know the search result should exist, and are confident they'll find it.

But that's a tiny sample of the tricks you can do once you have a basic ranking system and you start putting it in front of users. Everything they do gives you some information about what to do next. Ironically, the worst search engine, as a business, would be the one that perfectly solved search on the first try: everyone clicked the number-one result and never had to do that search again. That search engine doesn't collect any new information from users, but only because it already knows everything.[3] Fortunately for search engines, the product wasn't perfect upfront, which meant that search engines could create a taxonomy of failures: someone clicking on the #2 or #3 result was one kind, and might be an expected one for queries that could be resolved in multiple ways ("churches near me" could list multiple denominations, not to mention a fried chicken chain). Someone doing a search and then rephrasing it later implied some combination of their mental model of search being a little off and searchers’ mental model of them being wrong instead. And that raises a question with a multiplicity of answers. Some fraction of users will just get their search wrong, but some of those queries will be legitimate queries someone might make. Something like "cant translate latin" could either be a request for the Latin word that best translates to "cant" or a plaintive but misspelled search for other people who share the searcher's struggles to finish their Latin homework.

There are two main angles for solving this one:

  1. Better user-level data helps predict whether an ambiguous "bass" in a search query is something that goes well with rhythm or with white wine, or whether "polish" is a food or an ethnicity.
  2. Autocomplete is powerful not just because it speeds up search queries but because it provides real-time feedback about ambiguous searches.

Yet again, both of these work better with more users: you'll have more ways to quickly cluster them based on interests and behaviors, using a few hints from new users to quickly map them to the tendencies of established ones. And for autocomplete in particular, it helps to have a wealth of search data (or even to reach meta-conclusions like "how many related queries do we need to look at before the first two words or so a user gives us will let us predict which queries they're going to end up making with 99% confidence?")

A better search product is one that changes behavior, both for users and for site owners. The site owners, for example, want to be as explicit as possible to search engines about what they're offering, so search imposes a sort of bland but direct form of copyrighting, where every restaurant menu that used to be called "our seasonal selections" or something is now just "menu." But that also means that people know what to look for: SEO actually standardizes idioms! Every search engine will benefit from this kind of thing, but at scale there's more room to detect the behaviors that are too on-the-nose and thus probably attempts to game the system. Higher expectations about what information will be available online also means that more of it is—it's pretty hard, at least in big cities in the US, to find a business that does have steady opening hours but doesn't have those hours easily available online.

Of course, all this is just a curiosity if there isn't enough money for the increasingly expensive task of actually getting this information, running a search algorithm, serving its results to customers, and continuously increasing the extent to which you do all of the above. Fortunately, search is an unbelievably great place to run ads because the entire product is basically users telling the platform exactly what they're interested in at this precise moment, giving the platform an opportunity to sell that traffic to the highest bidder. Search naturally imperils some of the incentives around responding to searcher behavior—any change that makes the organic results better now competes with the ads! On the other hand, the categories where ads make the most sense are the ones where the organic results are already heavily commercial: there just aren't that many hobbyists dedicated to listing all the available flights between two cities on a given date, or carefully enumerating the best credit card offers. So early on, advertising ends up minimally affecting the user experience, except that the commercial intent behind the information they're being presented with is made explicit in the search result interface. Over time, there's naturally a broad tradeoff between commercialization and user experience, albeit one that funds other products that are relevant to the user. And the exchange rate for that tradeoff varies enormously from one query to another, and is always evolving. Over time, there's implicit pressure to increase ad load from the organic search side: an under-monetized search query is an arbitrage opportunity, so the extent to which companies can build their business purely by ranking well for search terms is the baseline extent to which the search engine can go ahead and charge for that traffic without materially changing what users see.

One way to look at this model is that the search business is a way to create a consumer and producer surplus by connecting people to the information they're looking for and collecting a cut of whatever wealth gets created this way. Which is obviously the ideal business to run in a world where GDP is still rising, data created per unit of GDP is still rising, and economies of scale on many dimensions still persist. Search is the prototypical platform business, but it’s a partly invisible platform—businesses and habits get built on the existence of search, and that’s self-reinforcing for a long time.


  1. Obviously, the unit economics deteriorate after a while. ↩︎

  2. Consider that second case, where a page used to have content, but on the next visit it returns a 404, times out, etc. One possibility is that it's really meant to be the most poignant HTTP response code: 410 Gone. But it might also mean that a page or an entire site happens to be in the middle of an update, which, especially in the early days of search, might have been done sloppily. So the right time to revisit that page is either very soon or never again. ↩︎

  3. This curse of perfection shows up in a few other places. If you make a trade that plays out absolutely perfectly—you read the tea leaves and figured out that a company had put itself up for sale or that a central bank had finally decided to devalue—then in a sense this idea was an asset on your balance sheet and making money just turned it into cash. You didn't know anything after the catalyst that you hadn't figured out beforehand. You learn the most from good businesses and bad trades. ↩︎

You're on the free list for The Diff. Last week, full subscribers read about how LLM chatbots combine the most and least viral product categories ($), the question of how many digital identities we need and how many we feel like managing ($), and steelmanning private credit ($). Upgrade today for full access.

Upgrade Today

Diff Jobs

Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:

Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.

If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.

Elsewhere

Equity Market Vigilantes

One feature of the Musk/Trump kerfuffle is that the market is both a referee and a participant in the match. There are two ways to look at Tesla's 14% drop in response to the breakup last week:

  1. This was fundamental analysis: 14% of the present value of Tesla consisted of some combination of favorable treatment from the federal government and the absence of gratuitous fines.
  2. This was expressive, ESG-style investing. Tesla's shares had been pushed up by people who liked Elon because of Elon's new association with Trump, and are happy to punish him by selling down the stock.

That second possibility is interesting because it's another tool Trump has: some of his associates have been happy to use his last name to make stocks go up, but it can work as a kind of financialized bully pulpit, too.

Scale

Why is Meta considering a $10bn investment in Scale AI? Scale provides a product that's necessarily commoditized: since AI labeling is trying to replicate the average person's thought process, there's a broad set of people who can do it. But there's also a cost to evaluating those people, and to scaling that operation. If it's Meta's view that human input will be a major constraint on future LLM models, it might make sense to lock up a lot of it as a kind of supply chain management move. This comes in direct conflict from where Google and Richard Sutton think the next leg of progress will come from: AI agents engaging in continual experiential learning and environmental adaptation, prioritizing direct experience and environmental signals, not subjective human preference.
Disclosure: Long META.

Volatility

High-yield bond issuance is back to $32bn monthly, the highest since October ($, FT). Some of this is catching up after offerings were deferred in April, but there's also just a generally rapid return to normal. Spreads are about 100 basis points higher, though. One way to read that is that all else being equal, the annualized risk of a wipeout is one point higher than it was before. The overall economy is more diversified than any given bond issuer, so this is a bond-specific risk and not a general measure of the impact of tariffs on the economy. But it still says that the uneven benefits of tariffs and other active economic management on the government's part leads to generally lower certainty out of proportion to its direct cost.

Tariff Impact

A good case study in this generalized increase in default probability: it turns out that higher steel tariffs disproportionately affect canned food, because the tin-plate steel used in cans is disproportionately likely to be imported ($, WSJ). (And that problem won't vanish any time soon. From the CEO of Cleveland-Cliffs, when asked recently about having shut down a tin-plate steel mill: "It's done. When the horse leaves the barn, the horse does not come back to the barn.") So this illustrates the unpredictable impact of tariffs, and how it flows through to different products. You might think that if you ran a domestic produce company, your business would be unaffected by, or even benefit from, tariffs. But if the produce in question happens to be disproportionately canned, and higher costs mean that the most profitable use of grocery store shelf space or last-mile trucking capacity is something other than canned food, you get unexpectedly dinger. There will be some tariff windfalls, too, but they won't massively affect people's planned capex, whereas the potential losses will.

Splits and Synergies

Warner Brothers and Discovery Communications merged in 2022, betting that their combined libraries of prestige TV and various flavors of background music-style viewing could be bundled in a way that kept cable revenue alive a little longer and made streaming take off a little faster. Now, they're splitting into separate streaming- and legacy-focused businesses. One thing that makes this kind of synergy play tough is that investors are fine with evaluating growth companies, fine with tracking the slow decline of a terminally shrinking one, but they get annoyed when they have to do both. Lenders, in particular, make it hard for a growth company to make big strategic bets that will take a long time to pay off, even if the cash flow from the legacy business can support them for a while. Sometimes, it's easier to split two companies so they can each have the capital structure that makes sense for them, instead of having one combined business split the difference.