-

@ Andrey
2025-03-26 12:06:30
When designing a highly available Kubernetes (or k3s) cluster, one of the key architectural questions is: **"How many ETCD nodes should I run?"**
A recent discussion in our team sparked this very debate. Someone suggested increasing our ETCD cluster size from 3 to more nodes, citing concerns about node failures and the need for higher fault tolerance. It’s a fair concern—nobody wants a critical service to go down—but here's why **3-node ETCD clusters are usually the sweet spot** for most setups.
---
## The Role of ETCD and Quorum
ETCD is a distributed key-value store used by Kubernetes to store all its state. Like most consensus-based systems (e.g., Raft), ETCD relies on quorum to operate. This means that more than half of the ETCD nodes must be online and in agreement for the cluster to function correctly.
### What Quorum Means in Practice
- In a **3-node** ETCD cluster, quorum is **2**.
- In a **5-node** cluster, quorum is **3**.
<img src="https://blossom.primal.net/36cd64d4478ea93cf954fdbb70aeba52053dd8bb610a502b4b6bb6507eab06c8.png">
⚠️ So yes, 5 nodes can tolerate 2 failures vs. just 1 in a 3-node setup—but you also need more nodes online to keep the system functional. More nodes doesn't linearly increase safety.
---
## Why 3 Nodes is the Ideal Baseline
Running 3 ETCD nodes hits a great balance:
- **Fault tolerance:** 1 node can fail without issue.
- **Performance:** Fewer nodes = faster consensus and lower latency.
- **Simplicity:** Easier to manage, upgrade, and monitor.
Even the [ETCD documentation](https://etcd.io/docs/v3.6/faq/) recommends 3–5 nodes total, with 5 being the **upper limit** before write performance and operational complexity start to degrade.
Systems like Google's Chubby—which inspired systems like ETCD and ZooKeeper—also recommend no more than 5 nodes.
---
## The Myth of Catastrophic Failure
> "If two of our three ETCD nodes go down, the cluster will become unusable and need deep repair!"
This is a common fear, but the reality is less dramatic:
- **ETCD becomes read-only**: You can't schedule or update workloads, but existing workloads continue to run.
- **No deep repair needed**: As long as there's no data corruption, restoring quorum just requires bringing at least one other ETCD node back online.
- **Still recoverable if two nodes are permanently lost**: You can re-initialize the remaining node as a new single-node ETCD cluster using `--cluster-init`, and rebuild from there.
---
## What About Backups?
In k3s, ETCD snapshots are automatically saved by default. For example:
- Default path: `/var/lib/rancher/k3s/server/db/snapshots/`
You can restore these snapshots in case of failure, making ETCD even more resilient.
---
## When to Consider 5 Nodes
Adding more ETCD nodes **only makes sense at scale**, such as:
- Running **12+ total cluster nodes**
- Needing **stronger fault domains** for regulatory/compliance reasons
> Note: ETCD typically requires low-latency communication between nodes. Distributing ETCD members across availability zones or regions is generally discouraged unless you're using specialized networking and understand the performance implications.
Even then, be cautious—you're trading some simplicity and performance for that extra failure margin.
---
## TL;DR
- **3-node ETCD clusters** are the best choice for most Kubernetes/k3s environments.
- **5-node clusters** offer more redundancy but come with extra complexity and performance costs.
- **Loss of quorum is not a disaster**—it’s recoverable.
- **Backups and restore paths** make even worst-case recovery feasible.
And finally: if you're seeing multiple ETCD nodes go down *frequently*, the real problem might not be the number of nodes—but your hosting provider.
---