AI Turns UGC into a PVP Zone
Over the weekend, Twitter started severely rate-limiting tweet consumption. Leaving aside the fact that this happened the day Twitter's contract with Google Cloud Platform lapsed, and that Twitter had previously stopped paying its cloud bills, it does line up with actions other companies have taken. Reddit infamously raised the price of its API, in part as a reaction to being scraped to train AI models. Stack Overflow wants to charge AI companies for using its data. Music labels don't want their music used for training. Voice actors are discovering that decades-old contracts have given companies the right to use their voices to train AI models that compete with them ($, FT). In the other direction, Microsoft recently raised the price of its search API, possibly to recoup higher per-query costs from incorporating more complex models into traditional search results, and perhaps also as a way to shake loose some search market share that could go directly to Bing.
To borrow an analogy from gaming: running a user-generated content site used to be a player-versus-environment area, where the threats were external and low agency—trolls, spammers, garbage content. But now it's a player-versus-player zone, where the threat is some specific agent with a definite plan to wreck one business model and replace it with another. PVP zones can be unpleasant, with lots of annoying sneak attacks and dubious strategies. But they're more interesting, because there's a thinking agent on the other side.
Training large language models was viable in part because there was so much human-generated, machine-readable text out there, available to anyone who could crawl a webpage. And it appears that one consequence of the growth of these models is that we'll reach Peak Freely-Available Purely-Human Content, as two forces collide:
- Certain kinds of human-generated text might as well exist in the latent space accessible to a model rather than as the output of a person. Many Stack Overflow-style questions can be trivially answered by ChatGPT (more on this fact, metrics behind it, and its consequences in a future post!), though the best explanations are still from human beings. If more queries get sent to a language model rather than to a Q&A site, the growth of long-tail user-generated content stalls.
- The existence of AI models means that openness is a bigger liability for user-generated content companies. The existence of a successful company in some category indicates demand for the content, so these are the easiest business models to drop into AI. "Quora, but there's no cold-start problem because our LLM can offer an instant answer" is one of dozens of narrow-but-accurate ways to describe ChatGPT as a very valuable product.
That's unfortunate. It's easy to over-extrapolate, though: the companies most affected by AI have to think about defense as well as offense. Defense means reducing the rate at which AI-based competitors can render them obsolete, or at least getting well-paid to watch it happen. Offensive action—building their own chatbots and generative AI experiences, and developing their own loop for capturing unique user-generated content and getting the maximum value from it—takes longer. In a way, it's analogous to the way neighborhoods change as property values rise: one symptom of this is lots of empty storefronts, because higher variance property values increases the option value of keeping a storefront empty a bit longer.
For services like Reddit and Twitter, it's a particularly tricky situation. Their big asset is their community, but a fair amount of the monetization of that asset comes from people with no investment in the community but who are willing to tolerate ads in exchange for the information they want. So the grand bargain for such a site looks like this:
- It's a money-losing proposition to build it entirely on behalf of its audience. If we use the standard rule of thumb that ~1% of people create content and only ~10% will upvote/downvote/share content, while the rest are completely passive consumers, then a site whose audience is only the active users has given up 90% of its pageviews. (And, it's likely that these sorts of power users are disproportionately likely to use ad blockers.)
- Ads slightly annoy everyone, but they're a bit less annoying to drive-by visitors who just want to get a specific answer. So these users end up subsidizing the operating costs of a community website, while the community's content creates the demand for traffic.
You can't replace the Twitter experience of an active Twitter user with bots (at least until someone turns "heaven-banning" into a real feature rather than a thought experiment). But you can replace the experience of someone who browses Twitter to read jokes, or who uses Twitter Search as a real-time news ticker. In one sense, these passive users are completely worthless, since they don't create any of the content the site relies on. In another sense, they're pure profit—all of their usage is readily monetizable, and since they don't interact with many features and will just close the tab if they can't find what they need, support costs are minimal.
One vision of the future is that the Internet separates into two different spheres: most of the content is custom-generated for individual consumers, tightly tailored to their needs, continuously updated based on their feedback, and utterly disposable. And then there are niches where human-to-human conversations still happen, and where the content is, in fact, original. AI models can produce content arbitrarily close to what a given user is looking for, while real humans are often less cooperative.
Another vision is that there's a hybrid model, where companies embrace the fact that human-generated content a) has less scarcity value in a world where things that look like the output of hours of work from talented people can be produced in seconds, but b) is valuable as a unique input into such content creation. That's the direction The Diff has been moving in; one of our experiments in this area is DiffGPT ($), a natural language search interface for the Diff archive. Scrapers don't go behind the paywall, so an LLM won't have all of this content unless they specifically go to the trouble to do so (ask us about our enterprise-tier API pricing—so we'll know if we should start working on an API).
Any business that can be characterized as AI-food can also be characterized as an AI-ranch-in-the-making. The human input is still required, but it's capex rather than opex or COGS. And that changes the equation a bit: if the content your site encourages is meant to be an asset that makes AI tools built off of it more valuable, then the counterpoint to endless seas of AI-generated fake content is a higher premium for the best real content. Reid Hoffman and Tyler Cownen recently talked about how AI is good at answering questions but not at asking them—as The Diff noted on Saturday, that makes perfect sense given how the models are constructed: they're trying to predict the highest-probability next token, i.e. they're relentlessly searching for a local maximum. But good questions are a way to jump out of the local maximum, and to draw attention to the improbable. They're a way to jack up the in-this-case-aptly-named perplexity for a bit, in order to consider a broader set of possibilities and then narrow down the right one.
Disclosure: Long MSFT.
This is a perfectly reasonable approach for many kinds of companies. Horse-based transportation was net benefited by the rise of trains, even though trains were more efficient for long-distance transportation, because trains also created much more demand for last-mile delivery. As long as the companies in question were aware of this, they had time to transition to a model that was more suited to a newer technological paradigm and that benefited from their existing network effects. This is not a hypothetical example; two $100bn+ companies, American Express and Wells Fargo, started out in the business of moving high-value goods by horse. ↩︎
This is one reason empty storefronts are associated with the cheapest and most expensive commercial real estate. There's a lot of uncertainty about how Fifth Avenue's foot traffic and social cachet will change over time—what if all the rich people stay in Miami!?—and in a poorer neighborhood, there's also uncertainty about whether things will bounce back or not. High occupancy makes the most sense in a stable place, where few stores are going under and new entrants are rare enough that landlords can't raise rents in the hope that they'll appear. ↩︎
People in Q&A communities really, really dislike it when a new user shows up just to copy-paste their homework assignment. ↩︎
One of my system idle processes is coming up with lyrics to parody song concepts, like "Isn't it Pythonic?" or something. These random ideas used to end up consuming lots of time, but now ChatGPT can scratch the itch trivially. ChatGPT's lyrics aren't good, exactly, but they're good enough—a parody song is the "smart beta" of the media world, with levered exposure to the "familiar" and "novel" factors, producing a higher risk-adjusted outcome with minimal effort ↩︎
A Word From Our Sponsors
This issue of The Diff is sponsored by—The Diff! We're excited to announce the launch of DiffGPT, a service paying subscribers can use to search through the back catalog. A very common question from subscribers is something like "Hey, did you write a piece a few years ago about X?" and sometimes the answer was "I definitely did and I can't find it, either." We're excited to roll out new tools that make it easier to interact with our content. (And if you think something like this would be useful for your business, whether as a feature for a content play or just a way to make internal documentation easier to access, just hit reply and we'll chat.)
Goldman Sachs is looking for ways to exit its Apple partnership after winding down other consumer lending businesses ($, WSJ). The business case for Goldman to own a successful consumer-facing business was always there: sticky funding, a new place to deploy capital, less reliance on market swings, and less exposure to regulatory risk. But the business case for Goldman succeeding at this was always trickier: commercial banking without a branch network is a volume business with thin margins, and it's a business where the idiosyncratic risks usually get discovered the hard way. So they ran into the challenge that if the business started small, it would be a consistent money-loser that couldn't out-earn its overhead, but if it got big fast, then it would have to get big by using underwriting models that weren't well-informed by live data about the specific customers they'd lend to. Meanwhile, running this business through Apple meant doing something that Goldman has historically been quite averse to doing: getting involved in a series of transactions where the information advantage is persistently on the side of some third party. The most striking result of all of this is that Apple did to a big investment bank exactly what any big investment bank's customers worry the bank will do to them: get them to take the other side of a big transaction on disadvantageous terms.
This piece is a good look at some of the challenges new companies face in the exchange-traded fund business. ETFs are ~13% of the US equity market now, and continuing to grow, so they matter for understanding how capital gets allocated in public markets. One attention-grabbing detail: many ETFs try to market themselves through financial advisors, but advisors see their biggest upside as reducing fees (other than fees paid to them), not generating alpha. Niche funds have high fees because the business is so scale-dependent, and it turns out that retail investors who buy funds directly are much less fee-sensitive. So this is a case where the "get big fast" model doesn't work nearly as well as the "make something people want" approach, even if those people overpay a bit at first.
Snippet Finance highlights the spike in stock price dispersion—the share of individual stock price movements that can't be explained by broader market moves or factors—since Covid. As with so many phenomena: things have come down since the pandemic peak, but remain elevated relative to historical levels. In general, dispersion follows a similar pattern to volatility, with sudden spikes when there's a regime shift, followed by a gradual decline to baseline. The growth of pod shops has generally pushed dispersion downward: no matter how compelling the narrative about one company in an industry, if its valuation gets out of whack relative to peers, someone with a mandate to buy the cheapest stocks in an industry and short the most expensive ones will tug it back into normal bounds. But sometimes the nature of "industry" changes—AI promises higher variance and potentially massive rewards to any company whose business model depends on ingesting and outputting large volumes of unstructured text. In a pre-AI world, Google, Accenture, and Buzzfeed are in wildly different businesses, but for the moment, their valuations are all sensitive to exactly the same force.
Whenever there's a big rise in interest in some industry, a common practice is to look at who was excited about it early and how much they benefited from being ahead of the curve. Softbank said AI was their sole investment theme in 2018, mostly stopped making new investments in early 2022, and has largely missed the generative AI boom and missed profiting from AI's benefits to big tech incumbents ($, WSJ). One thing this speaks to is the intrinsic difficulty in turning a broad thematic view directly into an investment thesis, especially in private markets. Adverse selection is a problem for all investors everywhere: as a shorthand, if you're seeing a deal it's because 1) someone wants your money, and 2) everyone else they've talked to has rejected either their idea or their terms. If one set of investors focuses on founders and the other focuses on broad theses, the latter investor gets a portfolio with better stories but worse outcomes. The synthesis of these strategies is to identify themes that the right startups will glom onto early, but that's a multi-step process: not just selecting a category that will generate lots of wealth, but selecting one that the right kinds of founders will select at the right time.
Jeff Blau of the Related Group is investing $6.5bn in building ten new office buildings in New York and London ($, FT), despite the office market's well-documented problems. The fact that this looks like a viscerally bad idea is promising, since Blau is participating in an active market where prices are set by both supply/demand factors and sentiment. But this bet is not pure market-timing—if it were, the right move would be to buy rather than build, as there are plenty of distressed sellers. Instead, it's a bet that one reason rents are under pressure is that office buildings are designed for an in-person-by-default era. If companies have to persuade workers to work from the office rather than from home, the office building will need to do some of that persuading, and in this case that means new buildings with better amenities.
Companies in the Diff network are actively looking for talent. A sampling of current open roles:
- A startup building a new financial market within a multi-trillion dollar asset class is looking for generalists with banking and legal experience. (US, Remote)
- A company building zero-knowledge proof-based tools to enable novel financial arrangements is looking for a senior engineer with a research bent. Ideal experience includes demonstrations of extraordinary coding and/or math ability. (NYC or San Diego preferred, remote also a possibility.)
- A new AI company is looking for senior engineers with experience building scalable systems with Node and Typescript on AWS. Management experience is a plus. (SF)
- A proprietary trading firm is seeking systematic-oriented traders with ML experience—ideally someone who has displayed excellence in DS and ML, like a Kaggle Master. (Montreal)
- A crypto proprietary trading firm is actively seeking systematic-oriented traders with crypto experience—ideally someone with experience across a variety of exchanges and tokens. (Remote)
Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.
If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.