Big Tech Sees Like a State

Plus! Pirate’s Treasure, Redux; Takedowns; Antitrust and Laggy Beliefs; Money, The High-Order Bit; More...

Welcome back to The Diff. Here are the subscribers-only posts you missed this week:

This is the once-a-week free edition of The Diff, the newsletter about inflections in finance and technology. The free edition goes out to 16,308 subscribers, up 136 week-over-week.

In this issue:

Big Tech Sees Like a State

One of the classic slogans of 1960s protests, found on placards, buttons, and in chants, was: “I am a human being. Do not fold, spindle, or mutilate me.” The phrase referenced a warning on IBM’s punched cards. At one level, it was just a slightly geeky catchphrase about new technology, akin to wearing an “Error 404: Democracy Not Found” t-shirt. But at another level, it represented a deep of anxiety. The people who used this slogan weren’t worried about punch card computers as an abstract force; they were worried about the punch card that had their name and draft number.

This anxiety is part of a long, long process. In Seeing Like a State, the anarchist-leaning historian James C. Scott describes this as the fundamental process of government: governments alter behavior in order to tax, conscript, and prevent the rebellion of their citizens/subjects. Scott uses the term “legibility” to understand this. A fully-legible citizen:

This is not the default state of human beings. It’s a multi-generational process. Seeing Like a State describes some examples, mostly failed, of governments trying to impose rules. In some cases, they work, but only at great cost—the USSR was able to collectivize farms and acquire grain, but millions starved in the process. In other cases, like the Meiji Restoration and France, the legibility-inducing process worked relatively well. And it may be an understated part of the Frontier Thesis in the US, too: it’s much easier to give people a fixed address and insist that they recognize property rights if you can give them, as an inducement, some new property at that address.

Scott assembles a panoply of examples: Russian Bolsheviks, Prussian foresters, American tomato entrepreneurs, medieval Thai tax collectors, post-colonial African reformers, modern farmers, ancient farmers, Brasilia, and Bruges.

Why is imposing legibility hard? The obvious reason is that most people do not want to pay taxes, serve in the military, or be prevented from mistreating the capital city’s plenipotentiaries at will. Another reason, though, is that all this legibility overrides systems that function perfectly well and rely heavily on local knowledge. Scott uses the term metis to describe this knowledge borne of practical experience. Many of his examples he gives involve the very local business of growing things. A traditional farm might mix a number of different crops, grown according to ad hoc rules, without much in the way of non-natural fertilizers or pesticides. The result is a balanced diet, a hardy, pest-resistant farm—and very little taxable income. When governments and large companies try to create their own farms, they usually grow monocultures, plant them in rectangular fields, use pesticides and heavy equipment, and, in many historical cases, achieve much lower yields or lower-quality products than the old ways.

Taxes, too, replace webs of local obligations, which are often more attuned to a community’s needs and limitations. A top-down taxation system looks more efficient on paper, but compared to an organically evolved system, it won’t match the precise needs or preferences of the smaller groups it’s imposed on.

So there’s a strong theoretical argument against imposing legibility. There are many historical examples. And yet, very few people emigrate from highly legible countries in the United States and Western Europe in order to live in more informal ones. Meanwhile, many people around the world do choose to immigrate to those hyper-legible societies. Clearly—and Scott concedes this—legibility is not all bad.

One reason for this is that highly abstract, ultra-legible systems have more advantages at scale.

Theoretical knowledge faces the constant uphill battle against much more applicable metis. Heliocentrism had to contend with the observed lack of stellar parallax; the germ theory had to contend with the fact that miasma theory produced pretty good advice for staying healthy during a plague; Kahneman and Tversky’s bank teller example must contend with the fact that in a normal conversation, deliberately offering a misleading detail and then pointing out that it’s misleading is fairly rude. Theory can go further than practice, because it’s built on a firmer foundation. But there’s often a long gap between when a theory is elegant and when it makes better predictions than a lifetime’s worth of time-tested heuristics in the same domain.[1]

But metis is a hill-climbing algorithm. If it’s based on experience rather than theory, it’s limited by experience. Meanwhile, theory is not limited by direct experience. By the 1930s, many physicists were quite convinced that an atomic bomb was possible, though of course none of them had ever seen one. Because some things can’t be discovered by trial and error, but can be created by writing down some first principles and thinking very hard about their implications (followed by lots of trial and error), the pro-legibility side has an advantage in inventing new things.

In a society that prizes legibility, more complicated forms of production can function. Instead of just farms and workshops, you can have factories. A factory is naturally more legible than a farm, because the factory takes external inputs (raw materials, machines, workers) and then produces uniform outputs.[2] A factory isn’t “native” to a particular physical location, and tolerates a wide range of climates and soil qualities. So the industrial revolution made society more legible through the simple expedient of making a larger proportion of it legible by default. And this legibility was self-reinforcing: if the factory makes tractors, the farms that buy them have more consistent outputs, so those farms, too, are more legible. More workers at factories meant more people who had to be at particular places at particular times, which meant that even people who didn’t work at factories still needed watches and clocks. And since factories require steady inputs, the arm of legibility reached outward: when factories are the main source of demand for mines, mines need to run at the cadence and variance of a factory. So do the stores that buy from them; it’s hard to amortize fixed equipment costs without continuous production, which requires continuous sales to keep itself going.

The most effective institutions tend to reshape society in their own image, and the more effective they are, the more profound the reshaping. Manufacturing and finance had a feedback loop here: complex supply chains can only function with reliable courts, uniform weights and measures, and a trusted currency. So industrialization drove all of the above. Scott talks at length about the politics of measurement: since feudal dues were set by custom, but defined by vague measurements, tax increases often took the form of enlarging the bag of grain used to measure rent denominated in a given number of bags, or arguments over what constituted filling a bag or basket. These informal systems might have given primitive political systems some fiscal flexibility, but they’d make any complex agreements untenable. A variable measure is poor collateral and makes it hard to hedge an obligation in one place with a future deliverable somewhere else. The modern system is less flexible, but the benefit is that it can produce much, much more.

Large-scale manufacturing imposes legibility in another sense. Any business with economies of scale works better the larger the market it serves, so a more manufacturing-based economy means less tolerance for parts of the economy that aren’t somehow plugged into the factory system.

The industrial revolution is a compelling example of economic growth imposing legibility, but it’s hardly the only one. Longitude was discovered thanks to a prize, which was offered as a way to subsidize trade. Railroads imposed uniform timetables across different cities—if a train is expected to arrive at 2 in the afternoon and depart again at 2:05, those times need to mean the same thing to the conductor, the passengers, and every other train conductor in the same network, as well as anyone expecting to meet a passenger at the station. More recent global trade made much more of the world legible: the shipping container, the dollar, and the ubiquity of English as a second language are all legibility-improving consequences.

But by far the biggest legibility imposers today are big tech companies. This is a recurring theme in The Diff, and may be the single most common one. Large tech companies create uniform identifiers for everyone they can: if you’re on Facebook, you have a unique ID in their system. If you’re not on Facebook, you’re still in the system, and they are no doubt assiduously trying to come up with reasons for you to finally join. These companies can do better than trying to teach everyone the same language; they can translate on the fly. They compile categorized, tagged, thoroughly-described data on products, people, and pages, and constantly analyze it.

And, for the average person, this is a material quality of life improvement. If you meet someone through a work function and don’t catch their full name, LinkedIn’s advanced search is very likely to narrow the list down; if you meet them socially, Facebook’s friend search, weighted by network proximity and other factors, will also help. Google, of course, surfaces all of the information on the public Internet in a convenient format, and Twitter gives you a real-time feed of it. Amazon makes their merchants use consistent descriptors within a given category, so satisficing on some criterion—cheapest laptop with a particular graphics card, for example—is straightforward.

These companies generally use legibility the same way governments do: to collect taxes. Governments try to price-discriminate when they tax people; charge too much, and you may discourage work or encourage tax avoidance. Charge too little, and, well, you could have collected more. Many tech companies spend their efforts getting ever closer to perfect price discrimination. The mechanics of ad auctions encourage bidders to pay their expected marginal profit for traffic, and even companies that start out with a non-ad model, like Amazon, end up using ads to capture the last bit of extra margin their suppliers were keeping. It’s a testament to big tech’s state-like capacity that Indonesia basically outsourced sales tax collection to them ($, Nikkei). Collecting taxes from individuals is a challenge that some states aren’t up to, but they can adopt the feudal model of granting the powerful privileges, like the right to do business in a country, in exchange for feudal dues.

Once you look for legibility, you start to see it everywhere. Every big tech company wants to control a measurement system, to ensure that the fundamental unit of some kind of communication is owned by them. And you see it across companies, too. Long supply chains work poorly when data is fragmented and hard to join, but they can work wonderfully when it’s all in the same format, with the same primary keys. Safegraph, for example, has a guide to data standards, which notes that they’ve made their own, a unique identifier for real-world locations. A system of long supply chains based on theoretical constructs about a world more complex than theory sounds brittle, but it’s more flexible than it looks; P&L discipline is a good way to keep dreamers grounded.

Getting taxed by big tech is a lot better than getting conscripted by them. (So far, the closest companies come to that is strongly promoting volunteerism). But it still feels like a worrisome trend. Governments expanded their state capacity, but they’ve often used it for ominous ends.

Fortunately for anyone who shares Scott’s skepticism of the legibility project, the end state for tech ends up creating a weird ego of the metis-driven illegible system we started with. The outer edges of ad targeting, product recommendations, search results, People You May Know, and For You Page are driven by machine learning algorithms that consume unfathomable amounts of data and output a uniquely well-targeted result. The source code and the data exist, in human-readable formats, but the actual process can be completely opaque. There is probably not a single human being at Google who can answer a question like “Why, when I search for X, is this site #4 while that site is #5.” The engineer might know what signals Google uses, and perhaps roughly what their weightings are, but every new signal adds new complexity, and the sum of a long tail of tiny signals can outweigh the human-tractable ones.

An ML-driven approach is only possible at large scale, and scale is only possible through legibility. But it’s the fate of all these legibility-imposers to move past legibility. They impose order on the world, and then they automate the order-imposing process, the order-imposer-refining process, and so on, until the end result is determined by a metis available to nobody.

This is an echo of how older legible systems worked. There were rules-based systems, and bureaucracies to implement those rules, but those bureaucracies worked through unspoken and informal systems. There are whole books explaining how specific bureaucracies work, and even those are not fully descriptive.

As a general rule in economics (and perhaps in every domain), most of the interesting results are embodied in the residuals. Equity is the residual claimant of whatever revenue a company can produce that isn’t claimed by employees, suppliers, creditors, and the government. Alpha is also a residual; it’s whatever part of an investor’s performance can’t be explained by purely statistical factors. Legibility is a continued effort to refine models so the residual is small and, ideally, normally distributed. But the process has limits, based on human bandwidth. We’ve reached a strange point in history where we reinvent at global scale an approach that worked at sub-Dunbar size. And it works well at its stated goals, even if every year it’s a little less legible.

[1] For example, it’s mathematically true that the aggregate returns investors get from the stock market must equal the market’s aggregate return, less fees, so an index fund with lower-than-average fees is guaranteed to be an above-average investment, at least as long as prices are being set by non-indexers. This is conventional wisdom today, and was theoretically true long ago, but it took Vanguard a while to convince anyone.

[2] Scott has an anecdote in Seeing Like a State about how even factories require local knowledge. A brand new machine may function exactly the way its specs indicate, but once it’s been in use for a while, small imperfections throw it off. A skilled operator can adjust for these, and for variations in the quality of materials. This kind of skill, though, gets less useful as the rest of the world gets more legible: better-quality equipment, cheaper replacement parts, more diagnostics to spot those imperfections, and tighter quality control on materials all diminish the value of local knowledge.


Pirate’s Treasure, Redux

In yesterday’s issue, I pointed out that accessing almost a billion dollars worth of Bitcoin once controlled by the “Dread Pirate Roberts” may, technically, be the first instance in human history of someone finding a pirate’s treasure map and discovering actual treasure. Two updates today:

  1. “Over,” not “Almost,” $1bn, and
  2. We now have the backstory: the money was stolen from Silk Road by a hacker, whom the government later identified. They prevailed on this unknown hacker to forfeit the money.

This appears to be a record for the proceeds of one hack, and one of the most valuable stolen goods of all time (it ranks below the Empire State Building). It’s also good evidence for what Bitcoin’s short-term bears and long-term bulls have always argued: any currency with a permanent ledger is an unwise one to use for criminal activities.


A legal strategy that goes in and out of vogue is to take down damaging online content by filing dubious copyright claims. The black-hat version of this is to write a copy of the offending content, backdate it, and then claim the original version is a copyright violation. This has worked ($, WSJ), but Google is catching on. The other technique, used in the last few days by online test-taking software company Proctorio and Netflix, is to find cases where people complaining about a given company cite material that the company has copyrighted, and then claim it’s an infringement. The default content moderation approach for most companies is to assume content doesn’t violate copyright by default, but to also assume that copyright claims aren’t frivolous by default. In this state, anyone with a loose interpretation of the law has a first-mover advantage.

Antitrust and Laggy Beliefs

The US government has sued to block Visa’s acquisition of spending data aggregator Plaid. The argument is interesting: it’s not that Plaid competes with, is a supplier to, or is a customer of Visa, but that Plaid’s product gives them the capability to launch a Visa competitor in the future.

As a general rule, antitrust actions lag investors in terms of what they think matters about a given company or industry. Microsoft, IBM,  and AT&T all got in trouble over the part of their business investors viewed as a melting ice cube. This case shows some progress in antitrust state capacity, since it actually does match what a smart Plaid investor would see as Plaid’s long-term upside.

Money, The High-Order Bit

Facebook now lets WhatsApp users make payments in India. This is something they’ve planned for a while. I wrote about it in June:

Communication, identity, and payments are all fundamentally tied together. Any payment system that doesn’t involve hard currency has some form of identity verification. Phone and messaging companies have an incentive to do a moderate amount of identity verification in the course of their business; someone who signs up for WhatsApp and spams a thousand people but never gets a reply is probably a bad actor, whereas someone who signs up and starts a series of back-and-forth conversations is more likely to be real. The messaging product is its own proof-of-work. Looking at WhatsApp today, you could imagine that this was the plan all along: bootstrap a universal identity system and communications network, and use it to create smartphone-based payment rails that skip legacy payment systems and enable more transactions.

Perceptual Arbitrage in Crowdfunding

For any consumer electronics, made-in-Shenzhen is the null hypothesis unless there’s a very good reason it should be made somewhere else. The Chinese manufacturing base has scale and flexibility that other locations can’t match. But it also has a (somewhat dated) reputation problem. This has changed the economics of crowdfunding, by encouraging China-based project founders to market their product as not-necessarily-from-China. This is not simply because of Western biases, though:

“Even if we’re doing something that’s purely for a domestic audience, sometimes we’ll have it with Western actors,” he says, which he credits to Chinese companies often facing prejudice against their products. They want these Western actors so “they’re perceived as being more high end, or reliable, or maybe in a higher price bracket,” he says.

Some economic forces are stronger than individual companies, and end up reshaping those companies to look like what they replaced. Chinese companies have a comparative advantage at manufacturing, and US companies have a comparative advantage at marketing. As a channel like crowdfunding matures, it increasingly looks like what it replaced.

Covid and the Talent Shift

Jason Lemkin says that for the first time, small companies outside of the Bay Area can tap into the Bay’s talent network.

Just in the past few weeks, 3 top, brand-name CMOs I know joined $10m+ ARR start-ups HQ’d in Berlin, in Atlanta, and in Colorado.  That never, ever would have happened before Covid. Because most top talent wants to be close to the CEO. You would. It’s just so much easier to excel that way, and so much riskier to be far from the CEO if you’re a top executive.

This is an underrated aspect of cities' network effects. Talent density is self-sustaining, because the companies that hire colocate with the employees they want to hire, and job negotiating leverage is partly driven by the number and quality of next-best opportunities. This trend cuts against the argument that future tech companies will have a “mullet” approach, with their senior executives located in the Bay Area and the rest of their employees working somewhere with a low cost of living. As has been pointed out many times, network effects cut both ways: every time someone leaves the network, it makes staying behind a worse idea.