Lamport's TLA+ for Spec Testing (and why you're not using it)
So there is this really amazing tool that lets you literally debug your design. You write your system specification in the "Temporal Logic of Actions" language, and you have an IDE with a model-checker that runs all possible execution paths of your spec in search of problems. Sounds like magic, doesn't it? So why isn't everyone using it?
Well, the short answer would be: because it's math. And math is hard, right?
So first of all: no it isn't. And having clarified that (you're welcome), in this post I will try to convince you that you absolutely want to use TLA+, that soon enough it will not even be optional, and that it's 100% doable:
TLA+ can save you loads of time and money
More accurately put, TLA+ allows you to mathematically prove Safety and Liveness properties of your spec. I.e that "bad things don't happen" and that "good things eventually happen" in your system. It is being used regularly at Amazon and Microsoft, and in his talks Lamport gives an example where it helped find a hardware bug in an IBM chip for the XBox 360. An intern at Microsoft wrote a spec for the memory system of the XBox and discovered that it would freeze after 4 hours of continuous use (I'm no gamer, but I believe this translates to "the first time you use it"). This design flaw made it through all of IBM's reviews and tests, which demonstrate how TLA+ can identify edge-cases that are very difficult to come up with or test any other way.
And while it requires some mathematical skills to fully exploit the power of TLA+, you can start benefiting from it even before you master it completely. To see how, let's consider a theoretical, simplified trading algorithm:
Suppose you found a really reliable way to identify upward-trends in equity prices in real time, and you want to execute the following algorithm:
- Identify an equity whose price is rising
- Buy some amount of it
- Turn around and sell it as fast as you can
If this works properly and you manage to "ride" those trends more often than not, then you can make a lot of money. And the risk? Well, the trend could change just as you try to sell and you'd end up losing. In a worst-case scenario the price could even crash right after you bought the equity and you may lose everything. But perhaps, if you're a High Frequency Trader (those systems which always reach the market first), then you're pretty sure that in most cases you can sell fast enough to minimize your risk. You might do extensive unit-testing, run your code on replayed market data and in sandbox environments, and feel quite confident that you have a good money-maker in your hands. And if you were doing that prior to the famous Flash Crash of May 2010 then perhaps you'd find no other issues with your algorithm. Maybe you'd consider a total market crash scenario, but in that case your algorithm shouldn't trade, right? It's looking for upward-trends.
But now let's look at the TLA+ specification of this algo. Actually, it's in the PlusCal language which automatically translates into the more weird-looking TLA+ code.
This process continuously "identifies" trends of rising prices of an equity (otherwise it would just sit there, and we want it to execute), buys the largest amount possible with its gains from the previous round, and attempts to sell a bit higher than it bought. If it failed to sell, it lowers the price and tries again. I want to focus on the selling behavior in this example, so I'm assuming nothing too weird happens when we buy equities for the next round (price can at most double in my model).
This may seem like a very strange way to describe the algorithm, as the process simply assumes constant upward trends. But remember: what we're checking here isn't whether or not our algorithm manages to properly identify trends or what is the expectancy of our gains in the real market. There are plenty of tests for that. Here we want to check every possible path of execution in order to identify potential design flaws.
Now, in order to simulate its behavior, we also have to model the market. And here's where things get interesting. We need to simulate the matching algorithm of the market where we're trading. Let's just assume we are matched against any bid with a higher or equal price that's in the order-book when we reach it. But what's in the order-book?
In the common scenario we should find quite a few bids at price levels around our price. Although technically, depending on the market's rules, there can be bids with much lower prices waiting for a suitable offer. Can there be no bids at all? Well, unless our equity has lost all its value permanently, we can assume that eventually we find a bid. How low can the bid be? Prior to the Flash Crash, market-makers (traders who are obligated to always buy and sell the equity) were allowed to use stub quotes of e.g buy for a penny or sell for 10K$.
I think that this already demonstrate how the formal-specification approach makes us ask all the right questions which can lead us to identify potentially dangerous scenarios. Not very likely scenarios, but as they say - the law of large numbers is your friend only until you hit large numbers. In an environment where thousands of transactions for millions of dollars are executed every second, you really must consider all the edge cases.
So, leaving out the "empty book" scenario for simplicity (it doesn't change the result I want to show), let's look at the code for the matcher:
Our matcher simulates bids from buyers whenever we want to sell. Each buyer "decides" how much to buy and at which price, or may not bid at all. There is always a bid from a market-maker, which might be a stub quote. Then, our offer is paired with those bids that meet our price.
And sure enough, running the model checker on a very small model (2 potential buyers in the "market") results in an error. I'm out of funds and can't buy more stocks. What happened? Reviewing the steps of execution I see that at some point the buyers weren't buying and I ended up selling to the market-maker for a penny. Another thing that happened is that several times I "identified" rising prices but then the price repeatedly went down just as I started to sell, ending up at a lower price than what I paid. Could those scenarios happen in real life?
In both cases, the answer appears to be yes. The first scenario is the equivalent of a sudden drop in market liquidity which leaves only stub quotes in the order book. According to the SEC, "executions against stub quotes represented a significant proportion of the trades that were executed at extreme prices on May 6" (the day of the Flash Crash). The second scenario can be interpreted as transient peaks in the price of an equity which is in fact losing its value. There is a very interesting analysis by Eric Hunsader of Nanex showing exactly this kind of volatile price behavior with a dominating downward-trend during the crash. See the "teeth" pattern on the way down, before the final dive.
The data Hunsader shows in a documentary about the crash could suggest a hot-potato game where a group of algorithms become synchronized and start buying and selling from/to each other, driving prices down. At least one major trading algorithm - the red blob in the image above - seem to have identified a problem and was pulled out before the market crashed (although maybe a human stopped it, I don't know).
Now, I'm not saying those algorithms were anything like the simplified one I described in this post, but whatever they were - their designers clearly didn't consider the scenario that played out that day. And those are not just kids who don't know what they're doing and made careless mistakes. Those were apparently very serious companies with fast, expensive systems and thoroughly tested algorithms. The thing is that edge-cases are so difficult to think of by just common brainstorming. As I said, that's what is so great about TLA+.
This is not even optional
Recently several Computer Science professors at several universities have started giving TLA+ courses to their students. I believe that it won't be long before this becomes a standard course in the CS program. Once a tool like TLA+ exists, can you really afford NOT to use it? Any software designer or architect who does use it will have a clear advantage over others who don't. Her designs would just be more reliable. If you don't want to be left behind, you should join the movement.
Of course you can do it
Yes, it's mathematics. Most notations are from Group Theory and you need to get used to mathematical proofs. I'm not going to pretend that it's the same as learning Python or C++. But on the other hand I don't think it's that different from getting used to thinking in Object Oriented terms. Personally, I remember that in my first semester as a student the concept of recursion was really difficult for me to wrap my head around. It's so counter-intuitive! But you get used to it. Our work involves lots of reasoning in terms which are alien to most people, math is just one more such unusual way of thinking about things. Lamport says that it takes people around two weeks to learn the basic concepts of TLA+, and then it's just practice. I guess it depends how mathematically-inclined you are to begin with, but it sounds about right. And people who have made the effort will tell you that it was absolutely worth it, and that perhaps the biggest benefit is precisely this new way of thinking about design problems, which really helps you become a better system designer.
http://research.microsoft.com/en-us/um/people/lamport/tla/tla.html
Engineering Leader | Databases | Distributed Systems | Machine Learning | Rust | Geospatial AI
5yStumbled across this just now and I've been looking into TLA+. Fantastic article, thanks!
Chief Technology Officer and Executive Vice President at Dream Exchange
6yHi Nira, Thanks for posting this. Do you by chance also have the model values for the constants that you used to run the model checker?
Formal Methods | Software Engineering | Software History
7yLove this! I stumbled into TLA+ a month ago and have been love-hating it ever since (it keeps making me admit just how _bad_ I am at designing systems). One problem I've been running into is there's just not that many "field notes" out there, and most of those are focused on formal proofs and hardware. That's one thing I really like about your post: you discuss how it helped build a system less complicated than DynamoDB. Would you be interested in answering a few questions I have about TLA+? I'd love to learn more about how it worked out in practice: naming and style conventions and you found useful, managing larger specifications, pain points in the IDE, etc. If so, please let me know! Either way, this was a great read. Thanks for writing it!
Director at Memo Communications, Inc
7yI love the write up Nira. Thank you. Is the PlusCal code you authored available on github or similar? I'd love to experiment with your work.