I came across the power of two choices theorem recently, it was referenced in a tech talk about schedulers. The central premise of the power of two choices is simple. Yet it is striking in its effectiveness in making decisions about routing requests. The following is my restatement of the premise:
You have n options, and a score or rank associated with each of them. You also have an incoming queue of decisions to make about which option to pick. At each decision point, if you pick two options from the n at random, and then use the score to pick the best among those two, you will get exponentially better results (in terms of distributing load), than choosing a random option.
Going from best of two, to best of 3 or 4 or more, only improves the results by a constant factor.
Using power of two choices really starts to pay off when n is very large and examining all n options becomes prohibitively expensive.
The paper applies this to load balancing, where they showed how effective this was. I stated it in more abstract terms because unsurprisingly, there are uses for this beyond load balancing algorithms. It has been shown to be effective in hashing, shared memory emulation for DRAMs, circuit routing, and scheduling work onto servers.
All of this sounds really cool, but I wanted to understand why it works. Like other "magical" approximations I've learned about - bloom filters, Minhash, T-Digest etc, there is complex math behind it.
The paper models the problem using balls and bins, instead of servers and requests. If we chose a bin at random among n bins to send an incoming ball to, the bin with the maximum load can be proven to have O( log n/ log log n ) balls with probability at least (n-1)/n. I tried to follow the proof behind that, but I couldn't understand it. It involves using a Chernoff Bound. I'll have to dig into this part later. For the purpose of this blog post, it is enough to understand that when picking a bin at random, the bin with a maximum load with high probability has O(log n/log log n) balls.
For power of two choices, instead of picking a bin at random like we did above, we pick two bins independently and uniformly at random, and then send the ball to the one with the lesser load between those two. Now, the bin with the maximum load can be proven to have a load of O(log log n /ln 2) + O(1) with high probability. That is exponentially smaller than the completely random choice method. But why?
Intuitively, with the two choices method using n bins, we can show that at-least half the bins have 2 or more balls, a quarter of them have 3 or more, 1/16 have 4 or more, and 1/256 have five or more. To prove this, consider the ratio α to be the fraction of bins that have i balls. The upper bound for the fraction of bins with at-least i+1 balls would be α*α. Given this intuition, and that the placing of balls into two bins meets the criteria for a binomial random variable, the proof again uses Chernoff bounds to show that the load is O(log log (n)/log(2))
I don't fully understand the probability math behind this proof. However, I think the power of two choices is a great piece of knowledge to have at the back of my mind. I no longer work at Indeed, but now I wonder if the boxcar implementation would have been simplified on the client side if it used power of two choices, instead of maintaining a heap sorted by server slot numbers. Boxcar clients only ever talked to pools consisting of up to a max of hundreds of servers, so the n is too small for it to be a powerful enough performance optimization. Maintaining a min heap of size < 100 is not a performance bottleneck. Still, it might be interesting to compare average load on the servers using both methods, as a fun theoretical exercise.
Update: Former co-worker Dan Heller pointed out that power of choices could apply to life outside of CS too. It should work when choosing which super market line to go to, or which summer blockbuster movies to try getting tickets for.
You have n options, and a score or rank associated with each of them. You also have an incoming queue of decisions to make about which option to pick. At each decision point, if you pick two options from the n at random, and then use the score to pick the best among those two, you will get exponentially better results (in terms of distributing load), than choosing a random option.
Going from best of two, to best of 3 or 4 or more, only improves the results by a constant factor.
Using power of two choices really starts to pay off when n is very large and examining all n options becomes prohibitively expensive.
The paper applies this to load balancing, where they showed how effective this was. I stated it in more abstract terms because unsurprisingly, there are uses for this beyond load balancing algorithms. It has been shown to be effective in hashing, shared memory emulation for DRAMs, circuit routing, and scheduling work onto servers.
All of this sounds really cool, but I wanted to understand why it works. Like other "magical" approximations I've learned about - bloom filters, Minhash, T-Digest etc, there is complex math behind it.
The paper models the problem using balls and bins, instead of servers and requests. If we chose a bin at random among n bins to send an incoming ball to, the bin with the maximum load can be proven to have O( log n/ log log n ) balls with probability at least (n-1)/n. I tried to follow the proof behind that, but I couldn't understand it. It involves using a Chernoff Bound. I'll have to dig into this part later. For the purpose of this blog post, it is enough to understand that when picking a bin at random, the bin with a maximum load with high probability has O(log n/log log n) balls.
For power of two choices, instead of picking a bin at random like we did above, we pick two bins independently and uniformly at random, and then send the ball to the one with the lesser load between those two. Now, the bin with the maximum load can be proven to have a load of O(log log n /ln 2) + O(1) with high probability. That is exponentially smaller than the completely random choice method. But why?
Intuitively, with the two choices method using n bins, we can show that at-least half the bins have 2 or more balls, a quarter of them have 3 or more, 1/16 have 4 or more, and 1/256 have five or more. To prove this, consider the ratio α to be the fraction of bins that have i balls. The upper bound for the fraction of bins with at-least i+1 balls would be α*α. Given this intuition, and that the placing of balls into two bins meets the criteria for a binomial random variable, the proof again uses Chernoff bounds to show that the load is O(log log (n)/log(2))
I don't fully understand the probability math behind this proof. However, I think the power of two choices is a great piece of knowledge to have at the back of my mind. I no longer work at Indeed, but now I wonder if the boxcar implementation would have been simplified on the client side if it used power of two choices, instead of maintaining a heap sorted by server slot numbers. Boxcar clients only ever talked to pools consisting of up to a max of hundreds of servers, so the n is too small for it to be a powerful enough performance optimization. Maintaining a min heap of size < 100 is not a performance bottleneck. Still, it might be interesting to compare average load on the servers using both methods, as a fun theoretical exercise.
Update: Former co-worker Dan Heller pointed out that power of choices could apply to life outside of CS too. It should work when choosing which super market line to go to, or which summer blockbuster movies to try getting tickets for.
Comments
Post a Comment