In this issue:
- Handy Robots—Robots can do many useful things, but mostly in the sense that there are many robots that are good at specific tasks and useless for others. They're competing with evolution to build general purpose motor/sensor collections that can match the human hand, and getting closer to that outcome is a big deal.
- It's Personal—AI is a small world, and some of the most significant events in corporate AI history come down to personal animosity.
- Rationing—Anthropic wants users to figure out the best way to keep GPU utilization as high as possible over the course of a day.
- Penny Stocks—Making a few million dollars while costing public equity investors a few billion is not very nice.
- The Text Glut—Different platforms need different rules for dealing with AI.
- Sora—In retrospect, it was pretty daring to be at least the third company to accept billions of dollars in losses or missed profits in order to build a short-form video app, especially if that app was structurally more expensive to operate.
Chat with this post on ReadHaus.
Handy Robots
As a technology, GPT-3.5 is just one point on whichever scaling curve you choose to use. It was a bit better than what came before and a bit worse than what came after. But, fine-tuned for chat and wrapped in a nice interface, it's the most successful product launch of all time in terms of market cap creation. It's been 1,216 days since then, and a total of about $13T of AI-related market cap has been created. That's around 10x the amount of Internet-related market cap added in the same period after the Netscape IPO, which was a similar cultural moment for the dot-com cycle.[1] Incremental technical advances are like this. The Wright Flyer was slightly better than other attempted planes, mostly because of its better controls. But that slight difference was the difference between nearly flying and barely flying, which turned out to be a big deal.[2] Once LLMs were good enough that you could interact with them in natural language and get back natural language answers, and it was also clear that there was a path to improving the quality of these answers, there was suddenly a real investment case.
So, if ChatGPT showed that AI can interact with the world of abstract information in a humanlike way, what shows that it can interact with the physical world similarly? It's probably a robotic hand, either as a 1:1 copy of the human hand—sensors in the same places, similar mix of strength and precision, world model that can be adjusted in response to multi-sensory feedback with low enough latency to operate chopsticks, etc.
As with heavier-than-air flight, we've been working on this one for a long time. Artificial hands long predate robotics, because historically a pretty good way to lose a hand was to be involved in a farming accident or a battle, and a good way to survive long enough not to bleed out from it was to be an aristocrat of some sort, who was rich enough to be worth keeping alive and also rich enough to see missing a hand as a problem to be solved. A French barber-surgeon, Ambroise Paré, fiddled around with mechanical hands in the 16th century, eventually building one whose owner said it allowed him to hold the reins of his horse.[3] You can definitely whittle a hand-shaped object out of wood, given enough time. And you can construct a simple mechanical hand that can grasp or let go. But the sheer range of activities the human hand is capable of are hard to match.
To get a sense of the evolutionary optimizations you're competing against, consider the fingernail. Claws make it hard to pick up and manipulate objects, but a completely soft finger is vulnerable to damage, and it gives fingertips something to press against. So fingernails rather than claws mean that our fingers are more sensitive and dexterous. Meanwhile, the things we make and use tend to assume fingernails; a grocery store can wrap something in cellophane and assume it won't be tricky to remove, medicine can come in blister packs (or in a bottle with a removable seal), and Lego doesn't have to include a Lego remover with every set they sell.
And that's just one little detail. Hands can grip for power, or for precision (think of the grip you have on a hammer compared to a nail). They can respond exquisitely well to feedback—it would be hard to screw in a lightbulb or play a violin solo without fingers that were able to instantly adjust based on feedback. Voice might be a bigger deal in terms of interpersonal communication, but gestures still matter. Floor traders have elaborate hand gestures they can use to transmit and confirm orders across a noisy trading floor. In other circumstances, like driving, verbal communication may be challenging and gesture-based feedback can be more efficient. There's incredible room for expressiveness here. Just different arrangements of raised and lowered fingers can theoretically describe 1,024 different states; 00100 is a way to provide negative feedback, while 01001 means, depending on context, that you're a fan of either heavy metal or University of Texas athletics. Of course, if you have a lookup table that has a five- or ten-digit key and then looks up values, it's pretty straightforward to implement that in a computer program. It's the versatility of hands that makes them such a big deal. (ChatGPT's Fermi estimate of the total share of bits used to affect the outside world that are transmitted through hands is 80%, though that's dominated by the early parts of the time series where literally-manual labor was ubiquitous. More recently, the fraction of bits transmitted by hand has gone up—think of all the phone calls that have been replaced by text messages, and all the longform writing that wasn't worth the inconvenience of finding an envelope and stamp but that's worth doing with a low-friction tool like email.
Hands are a universal interface, modulo some differences like poor coordination, arthritis, Parkinson’s and missing digits and limbs. So almost every tool you buy is going to be built around human hands. And since the human hand has millions of years of evolution—millions because if you go back early enough, brachiation competes with manipulation as the most important use case (a hand that's bad at prying open a coconut can be evolutionarily beneficial if it's good at helping you scramble up a coconut tree to avoid predators). Tool use started sufficiently long ago that the hand has coevolved with it; muscles and tendons don't show up much in the fossil record, but my guess is that anatomically modern humans are way better at using a can opener than homo habilis. So the bots have a lot of catching-up to do.
The hard part about emulating hands is that there's a whole package. Think of a process like unlocking a door. You have to fish through your pocket for a key (so you're navigating, entirely by touch, some area between two layers of plastic; you're helped by your world model, which tells you that a small, dense object like a metal key will probably sink under your wallet and whatever else is in your pocket. Then you have to maneuver it out, without dumping the contents of your pocket, aim it roughly for the door, slide it in—but don't punch through the door—and then turn it (but not too far!). And then detect that it's done, turn the key back, and deposit it back into the product. You can do this entirely unconsciously, but you're running a multi-model model that uses touch, vision, and sound, with latency from sensory input to control output measured in the tens of milliseconds.
What might a path to handy robots look like? The general approaches would be:
- Reach human hand parity on one task, then add another; if your robot hand can open a can opener, maybe tweak it to twist the top off a jar, too—you're still stabilizing an object with one hand and then spinning something with another, just at a different angle.
- Gradually enhance a general-purpose robotic hand, so it starts out being about as capable as a mannequin hand, then works as well as someone with severe arthritis, and continue from there.
- Make a hand-like system that's similar to human hands, just much slower, because it takes so long to process the information input—you might change what your hands are doing based on what you see, what you hear (either a response caused by the physical object you're manipulating, or someone saying "Stop that!"), and what you feel. In humans, these models are mostly integrated in real time, with exceptions like flinching away from a painful stimulus before consciously registering it. Some tasks are not speed-sensitive, though for others (whisking eggs, juggling) speed changes the qualitative nature of the activity.
All this is really a question about techno-optimism: there's a vision of the future where the vast majority of white-collar work is automated, but AIs still pay humans to do tricky tasks beyond the capabilities of superhuman intelligence, such as changing a diaper or petting a cat without crushing it to death. And there's another where humans and robots are interleaved across some tasks, and there aren't any humans-only domains. Robotic hands make that possible; they mean that you can drop in a robotic replacement for a human worker in arbitrary contexts, and expect it to open doors, turn dials, plug things in, pick up individual items from an assembly line and put them in boxes, etc.
A robotic hand that can turn a doorknob, dice an onion, and operate a yo-yo is an unlock for the world of atoms in the same way that a program that can finish a dialogue or turn a natural-language description into a working computer program is for the world of bits. Text is the universal interface for transmitting knowledge, and almost every way to turn thought into some change in the real world uses hands as an interface by default.
We'll be writing more checks through Anomaly in the near future into early-stage deep tech companies. So if you know anyone working on either robotic hands or robots that obviate the hand problem, please don't hesitate to give me a hand by introducing them.
It's fair to compare these events because they were a resounding answer to the big question in that category. AI was a topic tech companies had been talking about for years, and had implemented in their products in mostly invisible ways, but investors weren't especially focused on it: they'd see things get slightly better over time, and hear management attribute that to AI, but there wasn't an AI-specific product to anchor to. Better autocorrect and more robust searches for synonyms are nice, but they don't constitute a brand new product or category. ChatGPT was a new product that could only exist thanks to advances in AI, and switched people from thinking about AI as a 1-to-N enhancement of existing scaled products, and as its own category. For the Internet, nerds were excited, but it was unclear when the business case would pan out: people needed to be used to entering their credit card number online, and the consumer expectation for shipping was that it was slow—in a few episodes in the 90s and early 2000s, The Simpsons would show a character ordering something from a catalogue, and then show a title card reading "Six to eight weeks later." And killing catalogue retail was less exciting than killing all of retail. The Netscape IPO was proof that someone out there thought that Internet-based companies could be a viable business, and entrepreneurs and VCs responded to that incentive by brute-forcing every combination of buzzwords that could potentially fit into an S-1. ↩︎
This, too, led to an equity response, including a rally in the shares of Seaport Air Line, which was actually a railroad that had that name because "air line" meant "the path you'd take if you were hypothetically traveling through the air." A century later, various companies have realized that if their software engineers either write
import scikit-learn as sklearnor might hypothetically do so in the future, they can add "AI" to their name and pump up their stock. One of the funnier instances of this is Jet.AI, whose schtick is that they're a private jet company that lets people book through a chatbot. But when they first agreed to a de-SPAC merger, they were Jet Token, whose pitch was that they were a private jet company you could pay in crypto, though a separate filing notes that "To date, we have not received blockchain currency as payment." ↩︎Though in this case we have two layers of motivated reasoning: we get this story from Paré's own book, in which he is, well, talking his own book. And the customer is at least a little motivated to say that he didn't get ripped off by the prosthetic-hand salesman. ↩︎
You're on the free list for The Diff. Last week, paying subscribers got a preview of our exciting future, featuring AI necromancy ($), thoughts on how VCs, whose product is so fungible, differentiate themselves ($), and how consumer Internet companies organically segmented their markets in a way that AI companies won't ($). Upgrade today for full access.
Diff Jobs
Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:
- Ex-Anduril, Ex-Abnormal Security, Ex-Bridgewater, fast growing startup providing agentic cybersecurity to the long tail via MSPs is looking for platform and machine learning engineers. Startup experience preferred; what matters most is that you've grown in scope and handled ambiguity over the last few years. (SF)
- Series A startup that powers 2 of the 3 frontier labs’ coding agents with the highest quality SFT and RLVR data pipelines is looking for growth/ops folks to help customers improve the underlying intelligence and usefulness of their models by scaling data quality and quantity. If you read axRiv, but also love playing strategy games, this one is for you. (SF)
- A Google Ventures-backed startup founded by SpaceX engineers that’s building data infrastructure and tooling to accelerate product development for hardware companies is looking for a deployment strategist to ensure that the platform creates maximum value for customers with sophisticated engineering organizations across aerospace, transportation, renewable energy, and more. (LA, Hybrid)
- High-growth startup building dev tools that help highly technical organizations autonomously test and debug complex codebases is looking for forward deployed engineers who want to dive into customers’ complex software systems, find pressing business needs and deploy a cutting edge platform to help thoroughly test mission-critical applications. Experience with fuzzing or property-based testing a plus! (SF, London, D.C.)
- Series-A defense tech company that’s redefining logistics superiority with AI is looking for a MLE to build and deploy models that eliminate weeks of Excel work for the Special Forces. If you want to turn complex logistics systems into parametric models, fit them using Bayesian inference, and optimize logistics decision-making with gradient descent, this is for you. Python, PyTorch/TensorFlow, MLOps (Kubernetes, MLflow), and cloud infrastructure experience preferred. (Salt Lake City or NYC)
Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.
If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.
Elsewhere
It's Personal
The WSJ has a lengthy piece on the ongoing feud between Sam Altman and Dario Amodei ($, WSJ). It's important to read stories like this with caution, because the fewer people there were in the room for a given interaction, the more likely it is that the story is very much one person's point of view. And since Altman and Amodei have accused Amodei and Altman of sneaky behavior, that should factor into your assessment of their credibility.
One of the necessary results of AI being a business that requires lots of hardware and that pays employees well is that, relative to the industry's size, it's a very small world where lots of the luminaries are longtime friends, romantic partners, former or current roommates, etc. So some of the big strategic moments in the industry, like Anthropic splitting off from OpenAI, are heavily influenced by personal tiffs. It's not the first time a tech buildout would involve a pretty small-world social graph—Henry Ford and Thomas Edison were hiking buddies, in the early 2000s the CEOs of GE and Microsoft, two of the largest market-cap companies, had been office-mates at Procter & Gamble a quarter-century earlier. But it's particularly acute in AI, and unavoidably so.
This is probably good for both the social impact of AI and investors' returns, though worse for overall AI progress. If personal disagreements contribute to the existence of a more fragmented AI industry, and if these companies' founding stories are all about how they're aiming to be an alternative to their immoral competitors, then we'll have more heterogeneous AI capabilities, and companies will at least try to frame competition as pursuing particular virtues their competitors don't.
Rationing
Anthropic is adjusting their usage caps so it's costlier to use them during peak hours. Inference is an interesting market because, like electricity, it has to clear continuously, and it's easier to shift demand than to store supply. When they set up a queuing mechanism like this, they're asking users to decide what the optimal tradeoff is between price and latency. (Among other things, this pricing change probably challenges Excel's or Google's all-time record for the percentage of a product's users who use the product to decide how to deal with a price change.)
Penny Stocks
Last month, logistics stocks sold off after a tiny company that had recently pivoted from karaoke machines to AI announced that it had a better truck-routing algorithm. As it turns out, that company had recently raised money from an investor with a checkered past ($, FT) and a habit of buying convertible notes from tiny companies just before those companies hype themselves up.
Just looking at the flow of dollars involved, and you can see the appeal. A tiny company could issue a press release, hype itself up, and then rush through a secondary offering, but attention peaks quickly and it would sell at a less attractive price. If they lock in an equity sale agreement before hyping the stock, more of the retail buying they engender turns directly into cash for them.
Usually, this kind of behavior is small-scale enough that it's not a major enforcement priority; the SEC catches people doing this all the time, but there are a lot of them. However, in this case, it's a small-time grift that at least temporarily wiped out $17bn in market cap. So going after this is a bit like arresting people who steal copper pipes or catalytic converters: less about the proceeds they get and more about the negative externalities they impose on everyone else.
The Text Glut
Wikipedia is banning AI-generated content, and the number of apps submitted to the app store is up 55% YoY, adding delays to app approval. There turn out to be many cases where the cost of producing a given amount of text or code was effectively a tax or fine on doing so for spurious reasons. When the cost of creating a sequence of tokens declines, we get more tokens, but we also end up seeing other bottlenecks, like human approval, get more important. Wikipedia is probably more socially valuable if they forbid LLM content, because they'll provide cleaner training data (and, very conveniently, they link to sources, so we get training data on the specific question of how to readably summarize a complicated topic). And the other reason it's good for them to hold off on this is that an LLM-generated wiki article is superfluous when you can just generate the article you need by asking the chatbot of your choice. This is particularly helpful for hypothetical Wikipedia articles that take the intersection of a bunch of other articles—ChatGPT is great at answering history questions like "when and why did countries' leaders stop leading their troops in battle" or "What are some works of fiction about having writer's block, written by people who wrote that story in order to deal with writer's block?" The latent space of potentially useful Wikipedia articles that don't exist yet is vast, and the space of such articles that more than one person would want to read is sparse. So it's actually quite useful to keep the generative AI part separate from the human-vetted one, at least for now. For the app store, model collapse is less of a concern because apps get feedback from usage and revenue, so it's safer for Apple to find ways to speed up the approval process—perhaps taking advantage of some newfangled ways to review code at scale—and let the AI-created app influx continue.
Sora
The WSJ has an obituary for OpenAI's now-deprecated generative video product, Sora ($, WSJ). It was briefly popular, never quite reached escape velocity, and was costing them $1m/day worth of scarce GPU capacity.
One way to understand what went wrong is to compare Sora to TikTok, where the consumption experience is similar—not just because it's a sequence of random short videos, but also because the social graph is less important than recommendations based on user preferences. TikTok spent vast sums on user acquisition before it got to the point that many users were sticking around and making new videos, and they had to burn a lot of cash to get to critical mass. Sora had the same problem, with two twists: first, following the same strategy would cost them even more money because they'd pay to acquire users who were also expensive to serve. And second, there are already several scaled short-form video apps, where the most viral Sora content could be reposted to get more views. So to scale Sora, OpenAI would have to simultaneously subsidize its own users with inference and its own competitors with content, until it got the flywheel going. And even for a company that raises money in $10bn increments, that's more costly than it's worth.