cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohan T <>
Subject Is used by CloudStack?
Date Tue, 12 Jul 2016 00:33:19 GMT
Hi All,

Having been smashed by the unexpected behaviour of the KVM Heartbeat / HA
process, we've been working through the logic of the process, and  I now
believe the intent of the process is sumarised by:

The heartbeat process consists of 3 parts:

1. a shell script that's distributed to each of the hypervisors during the
CloudStack installation process:
2. Two java classes, built into CloudStack


Each of the classes periodically calls the script with
different arguments, the script is used to confirm the existence of NFS
mounts,  remount any that are missing, clean up (i.e. kill) VMs in
indeterminate state, read and write heartbeats to NFS volumes and force the
host hypervisor to reboot (as part of a "shoot the node in the head"
approach to restoring sanity to the cluster).

The KVMHAMonitor script writes a timestamp to each of the NFS volumes
(pools), each minute,  if this process times out  (4 times), then calls the
script once more to force a spontaneous reboot of the host (via: echo b >

The KVMHAChecker is responsible for triggering the script to read the
heartbeat value and compare with the current timestamp. Where ALL NFS
volumes are determined to be "DEAD" (i.e timestamp is older than 60


Is my understanding correct?

The problem is, when testing this logic in my test lab (currently 4.4.4,
but there's been no significant updates committed to these files since),
I've been unable to see any evidence of the KVMHAChecker actually
executing!  I see plenty of evidence of heartbeat writes (and of hypervisor
reboots triggered when this process timesout).


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message