Getting High Availability for IBM Db2 on AWS with Db2 Pacemaker Using Overlay IP – Part 1
Author: Scott Konash | 6 min read | April 25, 2023
Pacemaker solves a problem that many companies have with their cloud transformation endeavors – how to address high availability and business continuity with IBM Db2 when lifting and shifting these systems into the cloud.
While Tivoli System Automation (TSA) and Reliable Scalable Cluster Technology (RSCT) services for high availability exist, in our experience, Pacemaker tends to be a lot more automated and robust. It’s less prone to problems than TSA clusters implemented with RSCT. Failover performance is greatly improved with Pacemaker over TSA, which can reduce mean time to recovery (MTTR) in many applications.
When combined with disaster recovery (DR), Pacemaker can manage both primary and DR site clusters of servers in one resource model. When a takeover to a DR site is done with this setup, Pacemaker automatically and seamlessly updates the status of the various resources in the model to reflect their current roles, eliminating the need for any manual reconfiguration.
What is Pacemaker?
Pacemaker is a high availability cluster manager software that runs on a set of nodes. It’s combined with Corosync and offers a number of features:
- Detects component failures
- Orchestrates necessary failover procedures for seamless availability and business continuity
- Eliminates the need for TSA + RSCT services.
- Cloud-ready for both AWS and Azure.
- Packed with Db2, starting from Db2 LUW v11.5 Mod 6.
What is Corosync?
Corosync Cluster Engine is an open-source group communication system software that is utilized by Pacemaker for node management. It was created in January 2008 as a reduction of the OpenAISproject. The OpenAISproject was founded in 2002 to implement Service Availability Forum Application Interface Specification APIs. These APIs were to provide an application framework for high availability using clustering techniques to reduce MTTR.
Corosync offers the following:
- Consistent view of cluster topology.
- Ensure a reliable messaging infrastructure for event ordering on each node.
- Applies quorum constraints.
Corosync enables servers to communicate as a cluster, while Pacemaker provides the ability to Db2 to control how the cluster behaves. Corosync must be installed and configured prior to installing and configuring Pacemaker for Db2 failover/HA.
Evolution of Failover and High Availability in IBM Db2
Historically, Db2’s only means of Active/Passive HA was using platform-dependent clustering (HACMP in AIX, VCS in Solaris, RedHat clusters). This implementation was problematic, most notably due to the administrative overhead and division of responsibilities.
The combination of HADR and Tivoli System Automation (TSA) then allowed DBAs to set up and manage their own clusters. TSA was again problematic, due to its platform dependence and general lack of public cloud support and automation. Using open-source APIs, Pacemaker solves many of these problems
Prior to HADR, DBAs would need a home-grown solution of Log Function Shipping to maintain a Db2 DR database in Rollforward Pending status. HADR solved this problem with replication at the log buffer layer, removing most of the administrative overhead with this setup.
Multi-target HADR allowed DBAs to use HADR to solve both problems (HA and DR) without sacrificing one for the other. Pacemaker can be implemented to solve both problems in the cloud, with HA being fully automated and DR being fully managed in one configuration.
Pacemaker vs TSA for IBM Db2 HA/DR
TSA support is available for both AIX and Linux. Pacemaker supports Linux (Intel/POWER) as well as z/Linux for on-premises or locally-virtualized solutions.
The major advantage of Pacemaker over TSA, and its primary selling point, is the support for the public cloud. Pacemaker supports cluster management and automation in both Amazon Web Services (AWS) and Microsoft Azure. TSAMP, on the other hand, is not supported in the cloud.
Many companies either have, or will be, pursuing a cloud transformation for all existing legacy systems. Most organizations are a mixed bag of disparate database management systems acquired through acquisitions or mergers over long periods of time.
We’ve seen a trend of customers moving away from Db2 when moving to public clouds, due to (among other things) a lack of native HA support for Db2 LUW, often requiring applications to be re-architected.
Pacemaker makes Db2 in the cloud much more attractive, as it supports cluster management and automation in both AWS and Azure. Our clients often ask why HA is needed in the cloud, as public clouds have virtualization and fault-tolerance that is inherent in the cloud’s design. While cloud technologies are very resilient, they are not without their problems, including outages and security issues.
Without redundancy across availability zones and regions in the cloud, a hardware or software failure can result in hours of downtime. From a business continuity standpoint, with critical production databases, our clients need to ensure that their applications will stay up and running. HA automation is all about seamless business continuity.
In part 2 of this blog post, you’ll learn the step-by-step process for setting up Db2 Pacemaker on AWS using overlay IP. If you don’t want to wait, you can also take a look at our IDUG presentation on the topic.