Heartbeat / Linux-HA

Submitted by dlang on Tue, 2007-10-16 22:51.Availability | Unix

Failover and availability clustering for *nix systems

linux-ha.org

Wed, 1998-03-18 10:58

failover cluster high-availability HA heartbeat

Mature

Linux-HA aka Heartbeat is a modular package to control high-availability clustering. In spite of it's name it is not limited to Linux (although that is the primary platform), It has an automake based compile and has been used on *BSD, Solaris, and to some extent on AIX as well.

It can hearbeat in multiple ways (UDP broadcast, multicast, and unicast as well as over serial ports, although the serial port heartbeat has been accidently broken in some versions), and over multiple channels (up to 32 as of the time of writing)

It can support sub-second failover

It delays heartbeat checking on initial boot to allow switches time to get through their spanning tree detection timeouts.

It is designed to be secure with encryption, anti-spoofing, and anti-replay mechanisms

It has two forms of configuration files
haresources (all versions)
text config file with one line per group of resources to be treated as a set
limited to 2 machines per cluster
monitors system up/down status
allows for resources to prefer to run on one box of the cluster or stay where they are instead of moving back when the 'primary' machine restarts
detects split-brain operation (both boxes thought the other box was dead) and restarts both
includes STONITH to prevent split-brain operation (Shoot The Other Node In The Head by powering it off if it isn't healthy enough to respond)
support 'ping nodes' to ping external boxes from both nodes to detect the difference between a network switch going down and the other system going down.
only fails over if the other system is completely unresponsive over all heartbeat channels
multi-process operation so that monitoring does not affect heartbeats
crm (2.x versions)
supports everything haresources does plus
multiple machines per cluster, tested extensively up to 16 machines
supports resource monitoring
supports overall system health checking (this also lets you trigger a failover if you loose some, but not all of your heartbeat channels)
GUI config tool
supports on-the-fly reconfiguration for most operations (and improving)
works with cluster-IP feature on Linux to do limited load sharing between nodes

heartbeat uses a init style script (start, stop, and monitor) for each resource allowing for easy customization

while this feature list is extremely impressive, it's also intimidating, but the design of heartbeat lets you ignore most of the features that you don't care about for simple configurations.

sample ha.cf configuration

keepalive 2 # send a heartbeat every 2 seconds
deadtime 10 # declare the other node dead if it misses 10 seconds
udpport 442 # what port to use
bcast eth0 eth1 # what interfaces to heartbeat over
node fw-p # hostnames of the nodes
node fw-b
debugfile /var/log/ha-debug # what logfiles to use (later versions also support syslog)
logfile /var/log/ha-log
auto_failback off # once you failover keep the resources there instead of failing back when the first box comes back up
apiauth cl_status gid=haclient # allow the haclient user to connect to the running process to query and command heartbeat (useful for monitoring script, etc)

an example haresources config file

fw-p 10.201.7.197 MailTo show-status

translated into english this says
when both machines are starting at the same time fw-p should run this resource set
the 'active' box will have the VIP 10.201.7.197
when a box becomes active run 'start' on the following scrips (and stop when the box ceases being active)
MailTo (sends an e-mail out reporting the failover event)
show-status (changes /etc/issue so that the login prompt indicates if the box is active or standby)

heartbeat includes scripts for many purposes, including mounting filesystems, allocating IP addresses, sending notifications, sounding alerts, managing databases, managing apache, etc.

Trackback URL for this post:

http://lopsa.org/trackback/1526