Working with a Copilot

Plus! Catching Up; Private Capital; AI Regulation; AI Stocks; Cars and Interfaces

Working with a Copilot

For white-collar workers, the rise of Large Language Models, or LLMs, has created a very nerdy version of the opening of Alec Baldwin's speech in Glengarry Glen Ross. The bad news is, you're probably fired. The good news is, you're on a temporary probationary period in which you've gotten a nice promotion and now have a direct report with an unlimited attention span, a wide range of somewhat superficial knowledge, and a frustrating tendency to make elementary mistakes that require close supervision. You might be frustrated to work with such a subordinate, but at $20/month they're not asking for much.

Since many white-collar jobs are fundamentally about symbol manipulation—take in a stream of bits, perform some analysis, and output another stream of bits—they're all amenable to some level of automation. On the other hand, if you're a person with a paycheck and not a script run by a cron job, your work probably can't be fully automated just yet.

So one way to frame the task is to ask a two-part question:

  1. Which things you do can be automated? (And note that one part of automation is setting up things like scripts to download your emails and perform some batched action on some of them—not only is there more you can do with data, but the overhead of getting the data has gone down.)
  2. Which automatable components of a job are the most important complements to other work? There are some forms of automation that just decrease the time cost of certain kinds of overhead, like converting notes into bullet points or converting bullet points into prose. In other cases, though, the best thing to do with a 90% cost reduction is to 10x the output instead.

Part of working alongside LLMs means figuring out where you have a comparative advantage and where the machines do. This is very easy to see in coding, either by asking ChatGPT to write you code based on a text description or by using a code completion tool (I use Mutable, in which, full disclosure, I'm a happy investor). It's very easy to sketch out a project, and for most of the boilerplate components you're really just writing a descriptive function name and letting the model do the rest.

But you can't just mindlessly replace the process with AI. You'll get a poorly-structured mess in no time at all—with the awesome power of AI, you can create a year's worth of technical debt in a single afternoon! But this has a problem that's existed for a long, long time: every higher-level language increases the productivity of writing code while potentially decreasing the productivity of running it.[1] For LLM-assisted code, the big tradeoff to keep in mind is between writing something fast and being able to maintain it. (Interestingly enough, this puts an even bigger premium on using very readable languages, because you'll spend relatively more of your time looking at code you didn't write and making sure you understand how it works.)

LLMs probably won't tell you to refactor your code, but they can make it faster; in one recent project I noticed that ChatGPT had produced the same boilerplate in three different functions, so I pasted it in, asked ChatGPT to make a single-purpose function, and then switched to that. It doesn't help with more complicated decisions, but it does help to implement them. "Rewrite the following function to produce a timestamp, and rewrite this other function to accept a timestamp as one of its inputs" is a task that ChatGPT handles quite easily.

One limitation on large language models is their context window, or the volume of text they can actually consider at a time. Older models are forgetful, and newer models are less so, but they all have this limitation. And so do human beings; in terms of literal memorization, we can hold 7 +/- 2 tokens. But when we're memorizing things in a structured way, our effective context window is much, much wider than an LLM's. We can chunk together related concepts in memory, use analogies to stay on track, suddenly remember context when it becomes relevant again even if it hasn't mattered in a while, and otherwise perform GPT-5-worthy feats of attention. This is core to the thesis that using LLMs means getting promoted to a management role, because a manager is also supposed to have broader context that helps them manage the parallel efforts of individual contributors who don't all share the same information.

There is another sense in which using LLMs to help with projects ties in with human memory. It's especially important to parents who work from home and do not strictly believe in the concept of "work/life balance." If you're toggling back and forth between ChatGPT and a code editor, interruptions are much less distracting because you can alt-tab and immediately remember exactly what you were working on—just read the last few questions you asked ChatGPT. This radically increases the scope of projects you can work on with less than 100% attention. It's like a blinking cursor marking where you were in your train of thought.

One more way to think about the AI-assisted workflow is that for programming tasks, the starting question is "Should I write this myself, or should I find a library that does roughly what I want and then figure out how to make it do exactly what I need?" And for that coding assistants can help in three ways: first, by identifying the library you need that you didn't know existed, second, by helping to convert your task into something formatted for that library's idiosyncrasies, and third, in cases where there isn’t a relevant library, by generating code that does what such a library would do—or, to put it another way, by exploring the latent space of hypothetically possible libraries and finding what you need. In other words, they're most useful in cases where many people have done analogous tasks, but no one has done a precisely identical one before. What percentage of knowledge worker tasks does this describe? We can have a lively debate over this one, but it's going to be a debate over what we put between "99." and "%".

If you don't need to know the syntax for Pandas to move your data analysis from Excel to something better, and if you don't need to remember the name of some obscure feature you use every few months, do you need to remember... anything?

Yes. Memorization is surprisingly powerful in general, and even though there's a lot that you no longer have to look up, that means there's relatively more value in knowing what you otherwise wouldn't think to look up. The contents of Learn How To Irresponsibly Extrapolate From Limited Data in 24 Hours or Data Science for Buffoons are now available to anyone who can type a function name into VSCode and hit tab a few times to have their coding assistant write the rest. But the return on knowing when to use which technique is a lot higher, as is the return on knowing about tricky edge cases. ChatGPT is perfectly happy to write some code for you to fit a linear regression to nonlinear data, or to make all sorts of dumb bias/variance tradeoffs in your analysis. If you want to run an analysis on ten data points that wouldn't produce statistically significant results for ten thousand, you absolutely can. With a lower cost to making bad decisions fast, there's a higher value on making good decisions. Of course, some people will outsource these decisions to LLMs, too, but LLMs are better at sounding informed than being informed, so the result will be a bull market in the Dunning-Kruger Effect.

ChatGPT is also happy to do things that don't make much business sense, especially if your question doesn't embed the relevant context. So domain expertise is more valuable than it used to be because one of the complements to it—ingesting lots of information and extracting the important bits—has suddenly gotten cheaper.

Consider the work of an equity analyst. If they're covering a set of companies, they'll definitely be reading the news releases and conference call transcripts from those companies. Ideally, they'd also do the same for those companies' biggest customers, suppliers, and indirect competitors. They won't necessarily have time. But now they do! ChatGPT is happy to summarize conference call transcripts (so long as you paste in excerpts that are within the context window—GPT-4 is a big performance improvement, here). And you can even give them topic-specific prompts. A generic summary of Nvidia's last earnings call is less useful than a summary tailored to understanding cloud companies, competing GPU manufacturers, or PC gaming. This very much plays to humans' and LLM's relative strengths: you're using your own personal context window to extract valuable bits of information by preemptively identifying which bits are valuable. The more you're constrained by the fact that there are only 24 hours in a day, and the more you benefit from the convexity of expertise, the more LLM-assisted summarization will help you.

Using LLMs in production, or using them to write production code, naturally introduces lots of risks. You're getting nondeterministic outputs from deterministic inputs. Of course that happens all the time; living things are very non-deterministic even though we have readable source code which is executed through precision machines so tiny that it's only in the last decade that human-produced devices have gotten smaller.

You can, of course, worry about your LLM hallucinating, but there are some tasks that are resistant to this, where you can get external feedback. For example, if you build a webscraper you generally know what kind of data you'll get back, and you can sanity-check against that. If the search function on a site says there are 815 instances of the results you want to scrape, and your scraper finds 815 of them, it's probably working right. In other words, if you're using AI-enhanced coding, or AI-enhanced business processes you'll naturally want to include lots of error-checking and logging—exactly what you should be doing anyway, and one of the things you'd be tempted to skip if you were short on time.

There are some other downstream consequences of these change:

The economics of AI is a broad topic. The more general-purpose a technology is, the more likely it is that the biggest beneficiaries will be users rather than the sellers. The sheer volume of training data available to LLMs is an indication of how much demand there is for text, and how many lifetimes' worth of effort has been used to produce it. Now, the people whose work mostly consists of producing text, whether it's prose or code, will move up a layer of abstraction and make a lot more.

  1. This problem won't completely go away, but ChatGPT does make it fast to write tests, and based on checking with some simple scripts it was happy to translate Python code into C++, Rust, and Go. So the new loop might be: write something in the most expressive language you can, write some tests for it, identify the bottlenecks, and rewrite those in something with good performance while ensuring that the tests still work. Please use caution when doing this, of course; don't blame me if you end up writing high-throughput, low-latency, ultra-reliable systems that do exactly the wrong thing. ↩︎

  2. This will happen at multiple levels: if you're buying a business for the data, one thing you want to do is ensure that you can really access the data. Due diligence goes a lot faster when you can ingest a series of customer contracts into a system that spits out the provisions that determine data rights. The more a deal is predicated on using data, the more the M&A process means setting up a data pipeline. ↩︎

A Word From Our Sponsors

More and more investors use Daloopa to build and update their financial models. Daloopa captures every reported indicator—from GAAP metrics in SEC filings and press releases down to key performance indicators mentioned in management commentary, or buried in a footnote. Daloopa allows one-click model updates within minutes of new data releases. And with links back to source material, you can always double-check the math and get the numbers in context.

Daloopa is all about increasing idea velocity, and ensuring that investors spend less of their time looking for numbers and manually entering them into spreadsheets, and more time coming up with good trades.


Catching Up

Google is accelerating the rollout of AI in search, partly catalyzed by the fact that Samsung is considering making Bing the default search engine rather than Google.

On a side note, now is a great time to be a company that negotiates this kind of thing with Google. We don't have a clear view of what Bing's short-term economics look like, or how Bing values different kinds of training data, and it's been a long time since Google had a serious threat to its dominance. Meanwhile, Microsoft's CEO has mused about how Google makes more money from each PC running Windows than Microsoft does. So now is a wonderful time for browser companies, handset companies, etc. to drop little hints to Google.

But assuming this is something that Samsung is really considering, and not a negotiating trick: it's great for AI deployment that challengers face less risk in launching a new AI product, and incumbents feel the need to match it when it works. Right now, there are (correct) concerns that the gross margin on AI-assisted search is lower because the cost of a query is so much higher. But it's unclear what the long-term gross profits per query will look like. In mobile, the bull case a decade ago was that it was dilutive in the short term because payment on mobile was inconvenient and because bandwidth was low, but that mobile search produced more data and that ad units captured more screen real estate, so it would monetize better than desktop over time. This turned out to be largely true. In AI-powered search, there's room to increase the monetization surface because it's possible to suss out adjacent commercial intent. And since AI-powered searches can include more information, and can get more clarifications, the right price for ads is higher than it is for traditional search ads, and the ad load can go higher, too.

One model of this is that conversational search gets a general search engine halfway to being a vertical search product. Vertical search monetizes better than general on a revenue-per-user or revenue-per-click basis, and loses a lot of that monetization edge because it has to acquire users. But Google has user acquisition solved, and as the Samsung situation indicates, Microsoft may be looking to solve it for themselves, too.

Disclosure: Long Microsoft.

Private Capital

Analysts are worried that money will flow out of private equity and other alternatives now that yields elsewhere are so high ($, FT). The research quotes from BlackRock a few months ago:

“If we go back to 1995, [in order] to get a 7.5% yield, which is what many institutions are looking for, a portfolio could be in 100% [invested in] bonds. If you fast-forward 10 years, in 2005, it had to be 50% bonds, 40% equities and 10% alternatives. Then move another 10 years and in 2016, you [could allocate] only 15% bonds, 60% equities and 25% alternatives. [ . . .] Now today to get that same 7.5% yield, a portfolio could be in 85% bonds and then 15% equities and alternatives.”

One useful lemma here is that pension funds with fixed return targets that don't adjust based on interest rates act as a stabilizing mechanism for markets. When rates are low, i.e. when growth is low, they're forced to allocate more money to higher-risk ventures, some of which will hopefully accelerate growth. But when the economy is hot and rates are high, their incentive is to pull money out of moonshot bets and levered companies, and move it into treasury bonds instead. An investor who had a hurdle rate like "the ten-year, plus four points" wouldn't make these shifts. But when growth is high, one way to slow it down is to reduce risk takers' access to capital.

AI Regulation

In 1865, the British parliament passed a forward-looking law regulating the use of self-propelled vehicles:

Secondly, one of such persons, while any locomotive is in motion, shall precede such locomotive on foot by not less than sixty yards, and shall carry a red flag constantly displayed, and shall warn the riders and drivers of horses of the approach of such locomotives, and shall signal the driver thereof when it shall be necessary to stop, and shall assist horses, and carriages drawn by horses, passing the same.

Today, the first through sixth best-selling car brands in the UK are from companies founded in other countries. It's tricky to write rules for new technologies because the best model we have to go on is to make them work exactly the way the previous ones did, but that tends to truncate the upside.

So: the European parliament is considering some proposals on AI ($, FT), including rules requiring AI companies to share whether or not they're using copyrighted training data and requiring chatbots to disclose that they're bots. The former is a decent idea; it will be useful, albeit complicated, to come up with some kind of licensing regime for data. (The alternative is that the entities with the biggest incentive to create new training data will be AI companies themselves, who won't release the data but will deliver the outputs.) Requiring AI-generated content to be labeled as such is not a good idea, though: it amounts to an indirect subsidy for having people do things computers are already better at. It's almost always more beneficial to maximize the upside from a new technology first and to minimize the economic downside through redistribution afterwards. On the other hand, jurisdictions that don't expect to account for much of the upside have a different set of incentives.

AI Stocks

Some small AI-adjacent companies are once again outperforming, with one recent de-SPAC up 15x in three days. Stepping back from the technology itself, this is usually a late-stage sign of market sentiment. The dot-coms that chugged along for a while in the late 90s before suddenly tripling in Q4 of 1999 were generally not the best of the bunch. This doesn't have a direct effect on users—the Internet experience in 2002 was a lot better than it was in 2000, even though most of the market cap associated with it had been obliterated. Usually, when public market investors start sorting through the garbage heap like this, it means that the obvious winners are well-funded and the lottery tickets look relatively more attractive.

But there's another layer to this: four of the five best-performing small-cap stocks ahead of the market open on Friday had tickers that ended in "AI", including "MRAI," the ticker for Marpai, whose website's copy opens by noting that it's pronounced "Mar-pay." (They may not be thematically appropriate for AI investors, but they do move fast; the company announced a secondary offering Friday evening and had priced it at a 47% discount to Friday’s close this morning.) One plausible reason is the confluence of retail investor behavior and algorithms—not just the algorithms trading stocks, but the ones autocompleting results when people use their broker's app to make trades. Company names have an impact on performance, whether it's the alphabetical order effect or the phenomenon of public companies with similar names to private ones outperforming. When the trend investors are betting on is an abstract one like AI, rather than a brand name, there can be feedback loops, where user behavior on apps and trader behavior in markets creates a temporary, spurious correlation between unrelated companies.

Cars and Interfaces

The Diff has covered the car industry's evolution towards selling interfaces as well as vehicles ($): the more safety features improve, the more switching brands means switching to a different user interface, with some inconvenience and possibly some safety risk as well. As cars get more sophisticated, car companies want their economics to look like those of the software businesses that own the UI, not the hardware companies that assemble devices. So GM is moving to its own software instead of continuing to use CarPlay or switching to Android ($, WSJ). They still need to rely on outside companies for some things; it's been a while since the auto industry could afford a big media operation. It's hard to beat consumer Internet giants, but distribution is one way to do it, and that's what GM is betting on.

Diff Jobs

Companies in the Diff network are actively looking for talent. A sampling of current open roles:

Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.

If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.