High Availability Architecture – (2/4) Scalability

In the previous post of this serie we’ve seen how to increase the availability of your IT system . In this one we’ll focus on how to scale it.

Don’t believe the RAC !

You may be tempted to think : well the bottleneck of any enterprise application being I/Os and therefore the database, I’ll start with clustering that layer. Especially if you have opened the door to Oracle marketing representatives that will try to sell their (still excellent) Oracle RAC.

Well, actually the best way to cluster your infrastructure is not bottom up but top down. So the order to scale your 5 standard software layers (client, application, business, integration, database) is as follow :

  1. client/web layer
  2. application / business layer
  3. database layer

There is this excellent (and free) IBM redbook describing the path to high availability system and this is the recommended approach. Big Blue have been developing such HA solutions for about 40 years, you can trust them whenever they address this issue.

Scaling Dimensions

Okay now we know where to start with clustering, a question still remains : which type of clustering should we use ? There are two of them :

  1. Symmetrical Multi-Processing (SMP – Scale Up – vertical) : upgrading one single server with more CPUs and more memory
  2. Massively Parralel Processing (MPP – Scale Out – horizontal) : installing multiple servers in parrallel

Roughly speaking, Scaling Up is recommended for data centric applications. This is because you dont really want to have file lock management to be carried out and synchonize on multiple servers. So if you need to scale your database, your ERP or your CRM, scale it up.

On the other hand, Scaling Out is recommended for web servers. In that case, the goal is to share work load for non related requests . Google is a perfect example : many simultaneous non related requests are processed at the same time : many different servers load balanced upfront will do the trick.

There is a third scalability approach which is a mixed one : scaling both up (bigger server) and out (a second one). This one perfectly addresses scalability constraints for application servers. However, do yourself a favor and think twice before clustering EJBs.

So we know we’ll start first by scaling our web servers horizontally, then mix scaling our application servers and lastly scaling up our database servers. Now let see how we should implement this clustering.

Cluster modes

There are two ways to set up clusters :

  1. Active / Passive : one server is up and running with all traffic redirected on it while the other one is sleeping ready to take over if the first server has an outage
  2. Active / Active : All servers are up and running, sharing the workload.

Active / Passive is the simplest approach and the favorite one of operations teams : easier to manage. On the other hand it is expensive as half of the CPU power is paid for doing nothing most of the time. Besides, when considering such solution there is a need to address the transparent fail over issue and the time to switch. In any case, this Active/Passive cluster set up is the recommended approach for asynchronous servers such as JMS.

Active / Active is the most complicated issue from the Operations team perspective. However, it optimises the investment and the ratio paid CPU/used CPU. There is still a margin to kept though to absorb any stopping servers amongst the remaining ones. Recommended approach for application servers.

Database Clustering

So you have been scaling up your database on the biggest server on the face of earth but that still does not cope with you traffic : you need to set up a database cluster.

There are four clustering solutions available to you :

  • Share nothing : each server has his own tables and memory. This is also called partitionning. You need to make sure that you don’t have any joint between tables and that the tables are completely independent one from the other. Otherwise you lay end up doing 2 I/Os on 2 different servers for a call. This is the DB2 approach.
  • Redundant : each server has a full copy of the database. This approach implies that each change in one copy of the database is propagated to other servers. May prove to be quite cumbersome. MySql approach
  • Disk Share Technique : All the data resides on a share drive (such as SAN – Storage Access Network) and all servers access this disk in read mode while only one access the disk in write mode. Sybase clustering solution
  • Disk and Memory Share : This is the purest version of clustering. Only Oracle RAC offer this feature. Servers are just CPU boxes and they share network memory and network storage. Complete fail-over solution. Extremely complicated to operate and very expensive.

In the next post we’ll concentrate on the performance of HA architectures.

Leave a comment