Our new 6 server cluster provides reliability and scalability

We’ve recently moved our website hosting to a cluster – that’s a collection of servers connected to each other with the aim of improving performance. A cluster can bring huge benefits, both for our clients and us. Most significantly, it offers far more capacity and reliability than a single server. This means we can provide true scalability, hugely decreasing any risk of server failure and allowing auctions to be held on a much larger scale.

As auctions of a large size reach their climax the system receives thousands of bids, many of them simultaneous. If an auction runs into technical difficulties at this time the result could be disastrous. It’s therefore essential for auction sites to be able to deal with such high bidding volumes.

A cluster is fully capable of providing this kind of scalability. By default, many sites are installed on only one server. However, if need be (and if the budget allows) we can run sites “across” two servers. This would allow sites to be able to cope with larger auctions or a larger number of users for whatever period of time necessary.

The cluster is made up of six machines:

  • two loadbalancers – these help sites run more quickly and efficiently and deal with high volumes of traffic, as well as increasing reliability.
  • two app servers – these run the actual websites themselves. We can add new app servers to this layer of the cluster to further improve performance.
  • two database servers – these are equipped with RAID, meaning they can withstand any hardware failures that may occur and keep data safe, as data is always replicated to the backup database machine. In the event of a failure we can transfer you to another machine and continue providing you with your auction platform.

Both the app servers run at the same time to help share the burden. However, the cluster can still continue working normally even if only one of the servers is running. This means it’s impossible for the cluster to stop working unless at least two of the machines break down – a highly unlikely situation. If a technical problem comes up we can operate in failover mode, meaning another machine will take up the operation of that which has failed.

A helpful analogy might be that of a jet plane – if one of the engines on the plane encounters difficulties this doesn’t mean the plane will crash to a horrific end. Instead, another engine will take up the work of the one that broke down and the plane will carry on safely (though perhaps not as quickly as before). If the other engine begins working again then the plane can return to normal.

The Big Egg Hunt Auction was one of the first sites run on our 6 server cluster, and is a good example of the advantages a cluster can give. The key point of our brief was that the site must not fail at any cost. The cluster meant we could cope with the extremely high volume of bidding in the final hours of the auction, and also allowed us to use caching for the bidding updates as well as caching some of the key pages and images. This meant that we were easily able to deal with all the promotional activity for the campaign, including a Stephen Fry tweet.

