Saturday, October 3, 2015

Server Redundancy - Clustering info

Server Redundancy

Server redundancies include failover clusters and load balancing. Failover clusters remove a
server as a single point of failure. If one node in a cluster fails, another node can take over.


Some services require a high level of availability and it’s possible to achieve 99.999 percent
uptime, commonly called five nines. It equates to less than 6 minutes of downtime a year: 60 minutes
× 24 hours × 365 days × .00001 = 5.256 minutes. Failover clusters are a key component used to
achieve five nines.

Although five nines is achievable, it’s expensive. However, if the potential cost of an outage is
high, the high cost of the redundant technologies is justified. For example, some web sites generate a
significant amount of revenue, and every minute a web site is unavailable represents lost money.
High-capacity failover clusters ensure the service is always available even if a server fails.

Failover Clusters for High Availability

The primary purpose of a failover cluster is to provide high availability for a service offered by
a server. Failover clusters use two or more servers in a cluster configuration, and the servers are
referred to as nodes. At least one server or node is active and at least one is inactive. If an active
node fails, the inactive node can take over the load without interruption to clients.
Consider Figure 9.1, which shows a two-node failover cluster. Both nodes are individual
servers, and they both have access to external data storage used by the active server. Additionally, the
two nodes have a monitoring connection to each other used to check the health or heartbeat of each
other.

Figure 9.1: Failover cluster
Imagine that Node 1 is the active node. When any of the clients connect, the cluster software
(installed on both nodes) ensures that the clients connect to the active node. If Node 1 fails, Node 2
senses the failure through the heartbeat connection and configures itself as the active node. Because
both nodes have access to the shared storage, there is no loss of data for the client. Clients may notice
a momentary hiccup or pause, but the service continues.

You might notice that the shared storage in Figure 9.1 represents a single point of failure. It’s not
uncommon for this to be a robust hardware RAID-6. This ensures that even if two hard drives in the
shared storage fails, the service will continue. Additionally, if both nodes are plugged into the same
power grid, the power represents a single point of failure. They can each be protected with a separate
uninterruptible power supply (UPS), and use a separate power grid.

Cluster configurations can include many more nodes than just two. However, nodes need to have
close to identical hardware and are often quite expensive, but if a company truly needs to achieve
99.999 percent uptime, it’s worth the expense

No comments:

Post a Comment