Failover Detection

Failover Detection: Architectural Deep-Dive

The Problem

Database failures are not instantaneous, clean events. A node doesn’t simply switch from healthy to dead — it degrades. Replication lag creeps up, connections start timing out, queries begin stacking. By the time a traditional monitoring system raises an alert and a human responds, your application has already been routing traffic to a backend that is returning errors, serving stale data, or not responding at all.

The conventional approach to failover relies on external orchestration tools, DNS TTL changes, or load balancer health checks operating on coarse intervals. Each of these introduces a detection and propagation delay measured in seconds to minutes. For a database layer, that window is the duration of your outage.

The ProxySQL Approach

ProxySQL maintains a continuous, active view of every backend node it manages. Health checks run at configurable sub-second intervals — not as a passive observation layer, but as a direct participant in the connection pool decision. The moment a node fails a health check threshold, it is removed from the active host group and receives no further traffic. No external signal required. No DNS flush. No application restart.

Beyond simple up/down detection, ProxySQL understands replication topology. It monitors replication lag on replica nodes and can automatically demote a replica from the active read pool if lag exceeds a defined threshold. A replica that is technically alive but seconds behind the primary is not a safe read target for many workloads — ProxySQL treats it accordingly without manual intervention.

What You Get

Topology Awareness means ProxySQL isn’t just checking if a port is open. It queries replication metadata directly, tracking which nodes are primaries, which are replicas, and what their replication state is at any given moment. Routing decisions reflect actual topology state, not a static configuration that goes stale the moment something changes.

The Health Check Engine probes each backend continuously and applies configurable thresholds before acting. A single failed check doesn’t immediately pull a node — transient network hiccups don’t trigger unnecessary failovers. But sustained failure does, quickly and automatically, without waiting for a human to notice.

Automatic Recovery closes the loop. Once a failed node passes health checks consistently, ProxySQL reintegrates it into the connection pool without manual reconfiguration. The recovery is as automatic as the failover.

The Result

ProxySQL collapses the failure detection and response cycle from minutes to seconds — or less. Your application continues issuing queries through a single endpoint while ProxySQL silently reroutes traffic around degraded or failed nodes. No application code changes, no operator intervention at 3am, and no extended outage window while DNS propagates or orchestration scripts run. The database layer becomes self-healing at the infrastructure level, where it belongs.

Architectural Value

Key Capabilities

Reference Articles

Implementation

Architectural Review