One of the things we discovered when benchmarking our improvements to OpenX 2.6 – is that it is actually very difficult to do so on EC2. How I assume ELBs work inside Amazon, is that they are built on top of EC2 instances, and you start off with one EC2 instance per availability zone you have selected. The load balancers are then load balanced themselves via DNS round-robin. This allows Amazon to treat every AZ as physically isolated without cross-talk interdependencies.
So now the part where I said it is difficult:
- If you fire traffic at your load balancer in a naive way – what you will often find is that you always hit just a single load balancer in one availability zone. This seems to maybe max out at 20K requests/minute even if you have sufficient capacity behind the balancers.
Even if you fire traffic from multiple locations to get around the cached DNS result, it still starts off scaled down. Like I said above, I think you start off with one EC2 instance per availability zone selected. Amazon seems to employ their own auto scaling to detect how much capacity you need and expand the resources based on this. From my anecdotal evidence you should expect this to take 30 minutes to 1hr.
We went live in December 2011 with our OpenX 2.6 changes actually knowing/having discovered this pre-warming limitation, but expecting it to be closer to 20 minutes (we were on a short deadline running out of capacity in our data center). It was a test of nerves to say the least.
What I know now, is that you don’t have to take the hit at all. All you need to do is buy support with Amazon, and then open a ticket and ask them to manually scale up to be able to handle
X requests. They will ask you to specify a timeframe you need this manual scaling for (since they don’t like to keep things in manual mode), but other than that this avoids all the pains I spoke of. Fast forward to 2012, and I managed to serve 242K requests/minute peak during an Apple product launch, and servers didn’t break a sweat.