Proxmox Cluster Nodes No Longer Visible in GUI

Skip to main content
Du bist hier:
Drucken

Proxmox Cluster Nodes No Longer Visible in GUI

In Proxmox VE, nodes are no longer displayed in the web interface or appear with a red X. This is often caused by a corrupted corosync state, e.g. after a memory leak or crash.

Important: ALL nodes must be stopped completely. If even one node keeps running with a corrupted state, it will infect all other nodes when they rejoin.

1. Verify SSH access between all nodes

From every node, SSH into every other node (especially the master) and accept host keys if prompted:

# From master:
ssh server2
ssh server3

# From each other node, SSH to master:
ssh server1

All nodes must trust each other — no SSH host key prompts should remain, otherwise the automated script or manual procedure will hang.

2. Stop ALL services on ALL nodes

On every single node including the master, stop all cluster services:

killall -9 corosync
systemctl stop pve-cluster
systemctl stop pvedaemon
systemctl stop pveproxy
systemctl stop pvestatd

Wait until all nodes are fully stopped before proceeding. Running VMs are not affected — KVM processes run independently of cluster services.

3. Wait 60 seconds

Give all nodes time to fully release cluster state and connections.

4. Start master node first

On the master node only:

service corosync start
sleep 5
pvecm expected 1
systemctl start pve-cluster
systemctl start pvedaemon
systemctl start pveproxy
systemctl start pvestatd

The command pvecm expected 1 temporarily sets expected votes to 1, allowing the master to become quorate alone. This resets automatically when other nodes join.

5. Wait 60 seconds

Let the master fully stabilize before adding other nodes.

6. Start remaining nodes one by one

On each remaining node, one at a time:

service corosync start
systemctl start pve-cluster
systemctl start pvedaemon
systemctl start pveproxy
systemctl start pvestatd

Wait a few seconds between each node to allow it to join the cluster.

7. Verify

On any node, check that all nodes are visible and quorum is established:

pvecm status
pvecm nodes

All nodes should now appear in the web interface with green status.


Automated Script

Instead of doing it manually, you can run this script on the master node. It automates the entire procedure. Replace YOUR_ROOT_PASSWORD first.

PW="YOUR_ROOT_PASSWORD"
REMOTE=$(pvecm nodes | grep -oP "\S+$" | grep -v "(local)" | grep -v "Name")
for n in $REMOTE; do echo -n "$n stop: "; sshpass -p "$PW" ssh -4 -o StrictHostKeyChecking=no root@$n "killall -9 corosync; systemctl stop pve-cluster pvedaemon pveproxy pvestatd" 2>/dev/null && echo OK || echo FAIL; done
echo "Stopping master..."; killall -9 corosync; systemctl stop pve-cluster pvedaemon pveproxy pvestatd
echo "Waiting 60s..."; sleep 60
echo "Starting master..."; service corosync start; sleep 5; pvecm expected 1; systemctl start pve-cluster; sleep 3; systemctl start pvedaemon pveproxy pvestatd
echo "Waiting 60s..."; sleep 60
for n in $REMOTE; do echo -n "$n start: "; sshpass -p "$PW" ssh -4 -o StrictHostKeyChecking=no root@$n "service corosync start; systemctl start pve-cluster pvedaemon pveproxy pvestatd" 2>/dev/null && echo OK || echo FAIL; sleep 5; done
echo "Done!"; sleep 5; pvecm status | grep -E "Nodes:|Quorate"; pvecm nodes

Important Notes

  • VMs are not affected. KVM processes run independently of cluster services.
  • Do not reboot servers — only restart services.
  • ALL nodes must be stopped first. A node with corrupted corosync state (memory leak at several GB) will pass its broken state to rejoining nodes.
  • pvecm expected 1 lets the master reach quorum alone. Resets automatically when others join.
  • If a node refuses to join: check if /var/lib/pve-cluster/config.db has the same version as the other nodes. If not: stop pve-cluster, delete config.db and restart — it will be synced from the cluster.
Related Post