Google Cloud SQL: Best Practices for Maximizing High Availability

For companies large and small, usage of Google Cloud SQL instances always shares one common business requirement: high availability.

High availability is a system’s ability to operate continuously without failure. For traditional, on-premise SQL instances, high availability can be fraught with data availability and consistency issues, but Google Cloud SQL makes it easy for your applications to endure outages. 

While Google Cloud SQL is inherently highly available, there are a few (unlikely) disaster scenarios where you’d want the extra assurance. With all of the options available to you on Google Cloud, let’s walk through them and see how to architect your Google Cloud SQL instances so you can maximize high availability and have a quiet on-call weekend.

Scenario One: Preventing  a Primary Instance or Zonal Outage

The most basic example of high availability is avoiding a single point of failure — can your primary instance go offline without impacting applications?

High availability is easy to enable with Google Cloud SQL. Whether you already have a running Google Cloud SQL instance or plan on creating a new one, all you need to do is enable regional or multiple zone availability. (If you’re enabling high availability for an existing single-zone Google Cloud SQL instance, allow for downtime as the instance will need to be reconfigured, which could take several minutes or up to an hour depending on your configuration.

Once enabled, your architecture will look something like this: 

Note: The read replica is a separate Google Cloud SQL instance that can handle read operations from your application. 

Google Cloud SQL Diagram

With this architecture in place, Google Cloud SQL will automatically failover your primary instance to the standby instance in the event of an outage. Typically, this will take ~60 seconds to initiate. Under the hood, Google Cloud SQL is attaching the existing disk with all of the data up to the point of the outage to a new instance and repointing the database IP address your applications are connecting to. 

Your applications will automatically reconnect and carry on as normal.

However, you should note that adding this level of high availability to your Google Cloud SQL instance does incur a cost, and not everyone will need it. For some, the cost and benefit here may not be worth it — they can survive small periodic outages, but they need a solution for disaster recovery without high availability. 

In this case, we can configure Google Cloud SQL to perform automated backups. With these backups, we can restore data to a new instance in a different zone (or region). It’s a perfect solution for disaster recovery for a single zone outage.

But what happens when the entire region goes down?

Scenario Two: Preventing a Regional Outage

Many companies distribute applications across multiple regions, and they can’t have one region impacting other ones. 

Depending on the type and size of the SQL data — as well as technical and budget constraints — this might be a good use case for Google Cloud Spanner. But, for now, we’ll look at how Google Cloud SQL can help you survive these kinds of failures.

After we create a Google Cloud SQL instance, we can make special copies of it called “read replicas.” These replicas receive updates from the primary instance as they’re written, so they’re always kept up to date. While you can’t perform write operations to read replica instances, they’re pretty good at helping to scale out your reads. (That is if you can architect your application to support splitting the read operations into a separate instance.)

What read replicas are also good at is disaster recovery. We can create read replicas in regions other than the primary one for easy failover in the event of a regional outage. Unlike a high availability failover, you’ll need to repoint your clients to the new instance.

Scenario Three: Preventing Data Loss and Schema Migration Failures

While not entirely related to high availability, nothing can be more distressing than a halfway implemented schema migration or an accidental table drop. Unfortunately, your standby instances and read replicas will have the exact same data on them — by design! — so we’ll need a different solution to survive this disaster.

Google Cloud SQL has the ability to take continuous backups via point-in-time recovery. With point-in-time recovery, Google Cloud SQL archives binary logs that can be used to restore your Google Cloud SQL instances to precise moments in time, say right before the accidental table drop. These backups provide an excellent insurance policy against accidents, and they’re automatically enabled on new Google Cloud SQL instances.

Get a Secure Start on Google Cloud

Google Cloud SQL is just one of many Google Cloud services that help developers innovate quickly and securely. Talk to 66degrees’ cloud experts about Google Cloud SQL implementation or other services right for your business.

Author Box

Share this article