Adversarial Attacks in Statistical Arbitrage

Plus! Banking; Financial Engineering; Hiring and Game Theory; Currency Blocs; AI SEO

In this issue:

audio-thumbnail
The Diff May 27th 2025
0:00
/271.986939

Adversarial attacks in statistical arbitrage

During the week of August 7th 2007, some of the best-known quantitative trading firms suffered significant losses that nearly threatened their businesses. Scott Patterson recounts this in his book The Quants, and MIT professor Andrew Lo wrote a paper about the crises.

These accounts overlook a key question. How did Renaissance’s Medallion Fund, one of the most celebrated quantitative hedge funds of all time, suffer losses at the same time as pedestrian operators such as GSAM and Deutsche Bank?

The popular narrative is contagion writ large. But this is puzzling for several reasons. To start, the Medallion fund has not before or since experienced losses on a similar scale{footnote: There are rumors of a brief, but significant drawdown in March 2020, but no substantiated reports, so if it was contagion, they were somehow uniquely contagious in 2007 that they weren’t in 1998, 2000-2, or for that matter 2008, when their return was 98.2%. Previous analysis of Medallion’s monthly returns show almost zero correlation with any well known factors. From 2000-2010 the fund suffered only 3 down months and each of those months’ was less than 1%.

A more refined answer is there were two different systems with varying degrees of sophistication interacting with each other. One group pursues well known factors, such as value, momentum, low beta, earnings quality as well as some of the lesser known factors that providers such as Barra offer. The second group pursues strategies with a greater number of inputs and more nuanced model fitting techniques, signal aggregation and execution strategies.

Imagine there is one agent (widely reported to be GSAM’s Global Alpha) that starts liquidating a vanilla factor portfolio in mid-July 2007. They have a long position in some stock and they keep liquidating a little bit day after day after day. Eventually the very smart models of the second group (let’s say Medallion), identify that this stock is very dislocated from what their models think is the fair value and they begin buying the stock.

After enough buying of certain stocks and selling of other stocks, Medallion becomes the buyer of this larger liquidating portfolio. The liquidation of the first portfolio caused the distribution of test/live data to change in an extreme way. An analogous example might be in spam filtering where some good words are inserted or bad words misspelled. After some time one systematic trading system ended up owning the holdings and essentially morphing into (at least temporarily) another systematic trading system. (This is something every investor has to think about from time to time: if you’re trading against some big market dislocation, you’re making the exact set of trades that caused some other market participant to be in a position where they were forced to liquidate.)

By the second week of August the Medallion fund began experiencing losses of its own as the original cohort continued to liquidate their positions. With no firm idea of when the liquidations would cease, Medallion began liquidating its own portfolios. The discretionary decision of RenTech founder and CEO Jim Simons to cut risk in the Medallion fund (and the internal controversy that ensued) is detailed in Greg Zuckerman’s book The Man Who Solved the Market.

Recasting the 2007 crises as a sequenced liquidation with simple models inadvertently “attacking” more sophisticated models is a departure from the conventional narrative and has important consequences. First, the 2007 quant crises can be viewed as a non-malicious, yet adversarial machine learning attack. Second, finance might offer an ideal setting to study multi-agent interactions due to A) compelling incentives (millions of dollars) and little cooperation (binding NY-law non-compete agreements).

It’s also an instance of a general phenomenon where the more dislocated the market gets, the simpler the optimal strategy. A complex strategy implicitly relies on other people running a series of simple strategies whose rough workings can be predicted, but as the more algorithmic and backwards-looking market participants get blown out entirely, the behavior of the marginal trader gets harder to predict. As a result, the market is moving backwards in time, where simpler and simpler ideas can have attractive risk-rewards. You don’t get many chances to practice the kind of value investing espoused in The Intelligent Investor, or to buy the sorts of companies profiled in Common Stocks and Uncommon Profits at anything approaching the valuations Phil Fisher got—but at times like March 2020 or February 2009, you could buy pretty decent businesses at single-digit P/E ratios and didn’t have to stretch past the teens to own companies that were generational profit machines.(This reaches its extreme when the obvious trade is to buy or sell the S&P, or treasury bonds, i.e. to just buy the most liquid possible form of beta instead of pursuing any specific alphas.)

Researchers have long known that simple reinforcement learning algorithms competing with one another often learn, like humans, that collusion to fix higher prices is a winning strategy. More recent papers, like this one., show LLMs do the same thing. This falls into the category of a deliberate conspiracy and naturally draws the attention of regulators.

But another type of collusion is the accidental and emergent flavor. This is what happened in 2007. The participants had similar models and likely a lot of the same data and in one state of the world could probably have been accused of manipulating prices to their benefit.[1] But in another state of the world, when humans forced a few of the systems to liquidate, the emergent collusion almost killed the entire ecosystem.

A major concern about building agentic AIs to do the bidding of humans is unintended consequences from the interaction of algorithms with good intentions and design. The events of August 2007 highlight this is a real concern. A human in the loop with orders to liquidate caused the problem. Human-directed risk cutting likely made the problem worse. Indeed, the post-mortem described in The Man Who Solved the Market shows that without human intervention Medallion would have performed even better had they not cut risk. However when the leaders of these firms coordinated to stop selling they likely averted an even worse crisis.

It’s interesting to think through what the right approach for agentic AI is in a case like this. One argument might be that an AI agent running a portfolio should be cutting risk faster than a human trader would when people start talking about a number of standard deviations higher than three or so with a straight face. And it’s possible that this agent could be handed over to a meta-agent that tries to implement whatever the maximally effective adversarial strategy would be: if the smart move is long X and short Y, and smart people are getting liquidated left and right, the meta-smart move is probably to be short X and long Y, and, perhaps, to fixate less on market impact than a trader in a choppy market typically would. So the stack is the same one that works so well across the finance industry: 99% of the time, it’s run by the smartest person possible, and the other 1% of the time, it’s run by someone who illustrates the difference between “smart” and “cunning.”

The 2007 quant crisis gives us a glimpse of an agentic AI driven world where people and companies parallelize their interactions with one another and adversarial behavior emerges. How much progress we make on AI alignment and human-in-the-loop research agendas will have a lot to say about how much we should worry about this problem. But adversarial event detection probably deserves wider attention from practitioners and academics.


  1. This paper has a somewhat conspiratorial tone, but presents some striking data indicating something odd is happening. ↩︎

Diff Jobs

Companies in the Diff network are actively looking for talent. See a sampling of current open roles below:

Even if you don't see an exact match for your skills and interests right now, we're happy to talk early so we can let you know if a good opportunity comes up.

If you’re at a company that's looking for talent, we should talk! Diff Jobs works with companies across fintech, hard tech, consumer software, enterprise software, and other areas—any company where finding unusually effective people is a top priority.

Elsewhere

Banking

Chinese banks are increasingly squeezed by decreasing interest rate spreads: the rates on long-term loans are dangerously close to the rates the banks pay for deposits ($, Nikkei). China has severe economic problems in the form of an aging country and an overhang from real estate debt, but China also has a uniquely wide range of ways to deal with these kinds of problems. It's hard to talk about historical precedent given how much China's system has evolved, but at various times in the past the Chinese banking system has been the temporary shock-absorber for various macro problems—and while that's historically meant credit problems, there's no particular reason duration losses can't be absorbed by the banks for a while and then worked off over time.

Financial Engineering

One of the unique things about Softbank is not just the portfolio, but the fact that Softbank delights in complex financial engineering to buy and sell the exact return profile they want to target. The latest instance of this is Softbank's proposal for a joint US/Japan sovereign wealth fund ($, FT). Softbank tends to be levered long equities, so anything they can do to reduce downside, like breaking a trade deal logjam, is beneficial. They like to back capital-intensive businesses—sometimes their presence makes a business more capital-intensive all on its own—and when those businesses are strategic, they'll tend to get government funding.

Hiring and Game Theory

One of the core competencies of investment banks is to spend two years turning entry-level employees into financial modeling machines, at which point some of those new employees will stick around and many of the others will depart for higher-paying jobs in private equity. And the curse in this business is that the more reliable a bank is at doing this kind of training, the earlier in a new banker's life the PE firms start hiring. This has reached the absurd point where they lock in their post-banking job around the same time they start the job itself. Last year, recruiting started on June 23rd, but this time it started last week, albeit with just one firm, THL ($, FT).

There's something almost touching about how the big banks end up being the lumbering, unsophisticated counterparty that gets ruthlessly picked off by 22-year-olds, but it's hard to break the deadlock: the banks are starting to ask new analysts to disclose their job offers, for the sympathetic reason that they shouldn't be put on a deal where they're negotiating against a future employer and for the less sympathetic reason that the bank will probably treat them differently if it knows they aren't sticking around. But private equity still has all of the leverage in this situation, and the only way around that is for either banks or PE firms to collude in order to set a more fair time for the hiring season to start in earnest.

Currency Blocs

One of the US's exports is that American asset managers raise money overseas, and some of the upside from this stays in the US—and since that overseas money often comes from countries that run a trade surplus, it means the US financial sector gradually recaptures some of the share of global trade that American manufacturing has lost. But some European investors are warning US asset managers not to be too Trump-aligned in their approach. The US still has a comparative advantage in both generating alpha and gathering assets, but the US is also not the only country willing to tolerate deadweight loss in order to achieve broader economic aims.

AI SEO

This is useful high-level overview of how to optimize content so it leads to referrals from LLM chatbots, analogous to the older business of trying to get organic traffic from search engines. For a long time, the party line among search engines was that the highest ROI was 1) putting a little effort into things like having a sitemap, making sure text wasn't in images, wrapping titles and subtitles in the appropriate HTML tags, etc., and 2) writing good content that deserved to rank #1 for the query in question. And the party line among SEO consultants was that no, there were plenty of ways to make the algorithm work for you that extended beyond this. But LLMs are probably closer to the search party line, and only getting closer: the actual way to get LLM traffic is to be cited as a source, and the way to get that is to say something original that answers a question a chatbot user might have. The Diff is getting a growing share of new subscribers from LLMs, but that's downstream from the earlier goal of being a newsletter busy people don't skip.