Cloud SQL failover does not trigger for HA MySQL instance with read replica during maintenance window

We have a Cloud SQL MySQL instance configured with high availability and a read replica in a different region. During a recent outage when the primary instance became unresponsive, the automatic failover to the standby didn’t trigger as expected, resulting in 45 minutes of downtime.

Our Cloud SQL HA configuration shows enabled, and we have both a standby instance in the same region and a read replica in another region. The failover health checks should have detected the primary failure, but nothing happened. We had to manually promote the read replica, which caused data inconsistency issues.

I’m confused about the read replica limitations in this scenario. Should failover work with read replicas, or is there something specific about the HA standby versus read replicas? The documentation isn’t entirely clear on this distinction.

That explains the data inconsistencies we saw. What should we have done instead? And how do we prevent the standby from lagging in the future?

Yes, that’s exactly it. Cloud SQL HA failover has safety mechanisms to prevent data loss. If the standby replica is too far behind (typically more than a few seconds of replication lag), it won’t automatically failover because it would result in lost transactions. You need to investigate why your standby was lagging - could be network issues, high write load, or resource constraints on the standby.

Checked the operations log and found this: “Failover cancelled: Standby instance not synchronized. Data loss would exceed acceptable threshold.” So it seems the standby was too far behind to safely failover. Does this mean our replication was lagging?