Cloud SQL failover does not trigger for HA MySQL instance with read replica during maintenance window

george_sys · December 2, 2024, 11:14am

We have a Cloud SQL MySQL instance configured with high availability and a read replica in a different region. During a recent outage when the primary instance became unresponsive, the automatic failover to the standby didn’t trigger as expected, resulting in 45 minutes of downtime.

Our Cloud SQL HA configuration shows enabled, and we have both a standby instance in the same region and a read replica in another region. The failover health checks should have detected the primary failure, but nothing happened. We had to manually promote the read replica, which caused data inconsistency issues.

I’m confused about the read replica limitations in this scenario. Should failover work with read replicas, or is there something specific about the HA standby versus read replicas? The documentation isn’t entirely clear on this distinction.

barbara_func · January 6, 2025, 4:28pm

That explains the data inconsistencies we saw. What should we have done instead? And how do we prevent the standby from lagging in the future?

megha_king · December 21, 2024, 6:26am

Yes, that’s exactly it. Cloud SQL HA failover has safety mechanisms to prevent data loss. If the standby replica is too far behind (typically more than a few seconds of replication lag), it won’t automatically failover because it would result in lost transactions. You need to investigate why your standby was lagging - could be network issues, high write load, or resource constraints on the standby.

kavya_980 · December 16, 2024, 12:23am

Checked the operations log and found this: “Failover cancelled: Standby instance not synchronized. Data loss would exceed acceptable threshold.” So it seems the standby was too far behind to safely failover. Does this mean our replication was lagging?

Topic		Replies	Views
Cloud SQL automated backups fail during peak hours due to database locks and transaction contention Google Cloud Platform (GCP) question , database , sql , gcp-2019 , lock-contention , pitr , cloud-sql , backup-disaster , backup-scheduling	3	0	January 16, 2025
Cloud SQL failover lag triggers high connection errors in application tier Google Cloud Platform (GCP) question , compute , database , java , high-availability , gcp-2020 , retry-logic , connection-pooling , cloud-sql	3	0	January 27, 2025
Cloud SQL replica lag spikes during failover, causing stale data in analytics Google Cloud Platform (GCP) question , analytics , database , gcp-2021 , cloud-sql , failover , databases , sql-replica-lag , replication	6	0	September 28, 2025
Cloud SQL replication lag leads to stale data in data warehouse reporting Google Cloud Platform (GCP) question , monitoring , data-warehousing , database , gcp-2020 , stale-data , replication-lag , read-replica , cloud-sql	6	0	December 10, 2024
Cloud SQL failover triggers ERP downtime due to DNS propagation delays Google Cloud Platform (GCP) question , compute , database , high-availability , gcp-2021 , connection-timeout , cloud-sql , failover , dns-propagation	3	2	March 2, 2025
Aurora failover latency causes ERP transaction stalls during maintenance Amazon Web Services (AWS) question , database , connection-pool , aws-2021 , cloudwatch , aurora-mysql , aurora , aurora-failover , transaction-stall	6	0	July 19, 2025
Cloud SQL query latency spikes but Stackdriver logs missing slow queries Google Cloud Platform (GCP) question , database , sql , observability , gcp-2021 , missing-logs , cloud-logging , cloud-sql , performance-troubleshooting	6	0	July 3, 2025
SQL Database geo-replication shows high data sync lag after planned failover test Microsoft Azure monitoring , storage , database , rpo , rto , az-2021 , azure-sql , failover , geo-replication	7	0	August 24, 2025
Autonomous Database backup fails for cross-region disaster recovery, showing 'Backup not found' error Oracle Cloud question , disaster-recovery , database , devops-auto , backup , oci-2020 , cross-region , autonomous-database , ocid	4	0	January 24, 2025

Cloud SQL failover does not trigger for HA MySQL instance with read replica during maintenance window

Related topics