Get Used to a More Hallucinatory Search Experience

Plus! Diff Jobs; Cap Structure Arbitrage and Elon Musk; Active, Passive, and Risk; Convergent Evolution; Market Timing; It's All Ads!

In this issue:

The Diff May 28th 2024

Get Used to a More Hallucinatory Search Experience

Google has insisted on rolling out a new search feature that presents de-contextualized answers, based on data compiled by external providers who didn't even realize that they were doing R&D for Google. It's depriving the original content creators of the traffic they depend on, and throwing mass-produced, unverified information in the faces of users who never asked for it.

This is, of course, a summary of the state of Google in 2007, when they launched "universal search," incorporating image, books, and locations into their main search product. This was the end of the old model of Google as "ten blue links," which is another way of saying that Google has not had an unadorned ten-blue-links format for two thirds of its existence.

There were many genuine concerns when Google started doing this. In local search, for example, companies like TripAdvisor, FourSquare, and Yelp were betting their business on having accurate metadata about every location they tracked—if TripAdvisor didn't notice that a restaurant had changed locations, Yelp was wrong about their menu, or FourSquare got their hours of operation wrong, their service would lose users. Google was just less of a bet on the accuracy of that information and more of a bet on comprehensives, so it was entirely possible for Google to scrape a local competitor's data, get something wrong, and still deprive them of a click when customers just looked at the search results page instead of clicking through.

But scale changes things, and one thing it changes is that it creates a faster and more automated feedback loop between mistaken data, passive customer feedback, and accurate updates. If you search for a business's operating hours today, Google doesn't just give you their schedule—it'll also have a nice histogram showing average foot traffic to that location over time, and showing how busy it is in real time. If a company didn't update their website to share holiday hours, for example, the chart will instantly tell you if it's busier than usual because of the three-day weekend, or actually closed. It helps to have over a billion active Google Maps users, many of whom have opted in to, or failed to opt out of, continuously sharing their location data with Google.

This leads to a paradox for Google and for other big platforms: if they take their AI products seriously, they can improve them faster than anyone else because they get so much usage-based feedback. It's well within Google's financial and technological resources to subscribe to the Twitter firehose, check every single image to see if it's a screenshot of an AI-generated result, and either flag that result for improvement or note that it's fake.[1] And of course they're looking at how users interact with AI results, how they interact with regular results, how often they run the same search again with different phrasing, etc.[2]

But at the same time, it's guaranteed that at a given quality level, Google's AI will get more negative feedback than that of OpenAI, Anthropic, Perplexity, and the rest. There is no Perplexity-before-AI to remember, so there's nothing direct to compare it to. Bing is also safe; the jokes about Bing have been the same since it launched—compared to Bing, Google is 1) a somewhat better product, and 2) a much, much bigger one. And even aside from the PR problem—people were complaining that Google used to be good but has fallen off lately in the early 2000s—there's the financial cost. Launching a new feature has an opportunity cost because it displaces something that has been optimized to make as much money as possible with something less optimized, so it's really a bet on the long-term experience curve of whatever the new product is. For some categories, that's worked quite well—when Meta moved to short videos, it was also moving to a format where users are delivering the equivalent of either a thumbs-down or a fractional thumbs-up several times a minute by either continuing to watch their recommendations or switching to something else. Recommendation engines improve pretty quickly when a few billion people are providing this kind of feedback.

One group that has reason to complain about this is the publishers whose data has been used to train algorithms. It's absolutely true that the last time this happened, it was rough on publishers—it's hard to make a living running an ad-supported lyrics site if Google is licensing lyrics and showing them in search results, and it's hard to make money on local business reviews if Google is aggregating the information most people are looking for and displaying it with zero clicks required. Google's CEO claims that people are even more likely to click through AI-generated search results than traditional organic results, but doesn't really go into the other side of the issue: the businesses that are most hurt by Google surfacing more answers in search pages are the ones that were taking advantage of Google's prior willingness to send them traffic and customers' patience in seeing the answer they looked for buried in a sea of ads. (Google itself has plenty of ads, including ads in AI-based results, but even the harshest Google critic will admit that a contextual text-based ad is simply a nicer-looking user experience than a swarm of animated banners, and a lightbox ad with an "X" button so faint it might be an optical illusion.)

It’s likely that the clickthrough rate on search pages will decline over time, but the aggregate number of clicks is harder to model. Some searches are searches for exactly one piece of information—given how many people's names autocomplete on Google to "net worth," "house," "wife," etc., and given that the most common search terms are typically things like "YouTube" and "Facebook,"—it’s definitely a common use case. And AI can marginally help here, especially if you're having trouble navigating to exactly what you're looking for, but these searches are unaffected. In many other cases, though, the point of search is to find some piece of longer-form content. There's a continuum of AI utility for searches like this; if you're asking how to filter on a column in Pandas, an AI-generated block of code is what you want. If you're looking for a more serious grounding in linear algebra, so you can turn machine learning into a kind of math instead of a kind of magic, you probably want a single coherent textbook that you're reading alongside an LLM chat window that helps re-explain different concepts for you. A decline in clickthrough rates for searches is entirely possible if it consists of 1) a large increase in searches that don’t really require a second click, and 2) a smaller increase in the complementary searches that do lead the searcher to visit the site or app that provides what they’re looking for.

The apocalyptic model where AI tools eat all of the content and eliminate the economic incentive to make more of it is partly a failure of imagination, and partly over-indexing on what AI does well rather than what it does badly. A textbook written by someone who has a dogmatic view of how their subject should be taught is helpful because it's implicitly a list of all of the unnoticed misconceptions an independent learner might have, organized in roughly the order in which they'd have had them.[3] AI is good at telling you about the specific thing you're missing once you've identified it, but quite bad at suggesting the concept you don't realize you're missing.

AI is also not well-suited to lean-back entertainment, where the goal is to provide minutes or hours of entertainment rather than saving minutes or hours of less efficient searching. It's a good backdrop for this, especially in recommending small snippets of content which will have richer data per piece. But even if AI plays a big role in determining which Netflix movie you're watching, it's not close to replacing the movies themselves. (Sora looks cool, but there don't seem to be any deeply distressed deals for movie studio real estate. For what it's worth, Blackstone is a buyer. And given how much money they've put into datacenters, it's not as if they're oblivious to AI.)

AI is a complement to explicitly defined uncertainty. And, as with search before it, once we have a tool we'll find a lot more excuses to use it. "Near me" searches were not something the original Google could support at all; it had a link graph, but not a concept of "good place for a business lunch" or "cheapest decent burrito within driving distance." Google evolved, and the scope of Googleable questions grew with it. AI expands that set further. If it's an inexhaustible space—if we'll never get bored of asking for increasingly deranged DALL-E images or ever-more-esoteric Claude queries—then it will indeed take market share from other kinds of media. But most people don't spend most of their time using with information products that require constant feedback from the user. Those are tools, not entertainment, and sometimes ping-ponging back and forth with an AI for an hour is all the evidence you need to know that it's time to brush up on theory and fundamentals so you get unstuck.

In the long run, there's a simple economic reason to expect more search queries to return generative answers: it's easier to get perfectly optimized ad load producing every search result on the fly, tailored for that exact user, and to incorporate commercial messaging along the way. But this isn't the end of the world for accuracy of search or for the business model of publishers. The inaccurate search results were from inaccurate information in the data the model was trained on, i.e. in at least some cases, goofy AI results have replaced equally goofy organic search results—no change in quality, but it saved the user a click. The load-bearing assumption behind AI as a catastrophe for publishers is that everyone has had a burning desire to learn more about the world through chat than through internally-coherent, passively-consumed essays and books. And that just isn't an especially good bet.

  1. Presumably they're not also automatically generating community notes, and A/B testing different prompts for maximum persuasiveness, but I'm sure someone has idly mused about doing this. ↩︎

  2. This still requires a scramble, and presumably some Googlers have had their weekends ruined by needing to fix these results on a tight timeline. On the other hand, a company that's never ruining someone's weekend over an urgent problem is slowly ruining those same employees' evenings as they wonder whether their $500k in total comp is really worth the psychological toll of working at a company that doesn't especially want to win. The scramble is unpleasant, but high trust is forged in times of crises, and big companies need a lot of it. ↩︎

  3. You can sometimes be lucky enough to find different textbooks with offsetting dogmas. In linear algebra, for example, if you don't like Linear Algebra Done Right you can always try Linear Algebra Done Wrong. ↩︎

Diff Jobs

Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:

Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.

If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.


Cap Structure Arbitrage and Elon Musk

Normally, capital structure arbitrage refers to looking at the full set of financial instruments a company issues, and finding cases where two of them imply different views about the future. Meme stocks' bonds, for example, tend to imply much higher odds of near-future insolvency than the meme stocks themselves. And sometimes there's a direct volatility bet to make: Magnetar had a famous trade where they bet that the highest-rated tranches of subprime-backed securities would default, but also bet that the lowest-rated portion would outperform. This bet sometimes gets talked about as some cynical attempt to manipulate the mortgage market (a hedge fund, even a large one, is not really in a position to do that). But it was really a volatility bet, that the performance of a given pool of mortgages was less predictable, for better or for worse, than asset prices implied. It worked well.

There's another, looser form of capital arbitrage: find a collection of companies that are all controlled by one person, but are owned to varying degrees by others. Focus on the companies that aren't sticklers for corporate governance. In this case, the goal is to bet on whichever part of that constellation will be sucking up resources from the other parts.

This is the right way to think about the $6bn valuation of Elon Musk's xAI. It's not the capitalized value of what the company has done so far, or even the value of Elon Musk's fame as a way to differentiate from other AI businesses. But it does make sense as a proxy for the net present value of Tesla, SpaceX, and Twitter talent that might be quietly repurposed to work on AI instead.

Active, Passive, and Risk

There's been a debate in Canadian finance in recent days over whether the Canada Pension Plan Investment Board, which manages over $400bn USD in assets, should have just stuck everything in an index fund instead of picking hedge funds, investing in private equity funds, investing in individual PE deals, etc. After all of this effort, resulting in $3.6bn in annual fees to outside investors and a staff of 2,100 internally, they underperformed their long-term benchmark by 0.1% annualized.

But this turns out to be a case of bad benchmarks, rather than bad strategy. The CPPIB benchmark is heavily weighted to equities and has a tiny fixed income allocation; it's the kind of total return you'd aim for if you were running a portfolio that had to have a fixed-income component but was going to do some interesting things with the rest of its capital. In the last ten years, their benchmark has lost value three times, and each time CPPIB has had positive performance. Meanwhile, the two worst years of outperformance, fiscal 2021 and 2024, were also the best years for the benchmark. So a more representative benchmark would have lower volatility and thus lower returns.

But there's a problem with that story, too: a notorious problem with private equity is that PE firms don't accurately mark their portfolios down during market declines. Miraculously, when nobody's buying companies because equity markets are down by half, PE firms happen to have selected companies that all lost less value. This is a hard problem to deal with because there is a statistical way to solve the problem, by regressing PE returns against different market factors to figure out what their likely underperformance was. But this presupposes that PE doesn't add any value, which makes it hard to justify charging or paying 2 and 20. So it's an emergent property of PE and pension fund assumptions that we will never know the true risk-adjusted return of private equity, but can be confident that it's worse than the stated numbers. The critics of CPPIB's investment approach might be right, and might be wrong, but anyone in a position to know for sure has a big financial incentive not to say so.

Convergent Evolution

The NYT has a good interview with Netflix co-CEO Ted Sarandos: at growing companies that track a lot of user data, narratives can lag behind their actual behavior because they're always adding a new, larger cohort of customers with more average tastes. So Sarandos corrects at least one part of the content thesis:

[I]n 2012, I said we’re going to become HBO before HBO could become us. At that time, HBO was the gold standard of original programming. What I should have said back then is, We want to be HBO and CBS and BBC and all those different networks around the world that entertain people, and not narrow it to just HBO.

One notable thing about this is that HBO, too, is not just trying to be HBO; their main streaming offering bundles in plenty of Discovery content that is not as highbrow or edgy as a typical HBO production. When media companies are young, they highlight the kinds of content you can't get anywhere else, but eventually they grow to the point that they need the sorts of things you can find everywhere else, just more conveniently or with better production values, because "everywhere else" describes all of the alternatives that are providing a satisfactory experience for their prospective customers.

Market Timing

Gamestop has completed its 45m-share at-the-market offering, raising $933m in a week and a half. The company "intends to use the net proceeds from the ATM Program for general corporate purposes, which may include acquisitions and investments." Shares opened up 22% and have since pulled back a bit. Part of the rally is a reaction to the fact that there will be less immediate selling pressure now that the offering is complete, but another reason is that meme stocks have many was to turn $1 of cash on the balance sheet into much more than $1 of market cap. There's a sense in which the meme stocks, most of which were different companies in different industries, have all converged on AMC's model: sell a ticket to someone who's going to get entertained in return. AMC does this in theaters, but Gamestop is doing it at the New York Stock Exchange.

It's All Ads!

PayPal is planning its own ad business, using purchase data to target ads. In general, ads monetize better as you get deeper into the funnel: a banner on a news story makes less money than a product recommendation in search (whether that search is a general-purpose search engine or the result of doing a search on a site). Related products and bundled discounts can make even more money. But once the customer has decided what to buy and is checking out, the utility of ads goes down because they trade off higher basket size against a higher shopping cart abandon rate. So PayPal has data, and scale, but the unsolved problem for them is inventory.