cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bjoern Teipel <bjoern.tei...@gmail.com>
Subject CLOUDSTACK-5859 [HA] Shared storage failure results in reboot loop; VMs with Local storage brought offline
Date Thu, 27 Mar 2014 05:48:00 GMT
Hi folks,

in light of Cloudstack issue #5859 I would like to know what the
intention was for the kvmheartbeat.sh script, which ultimately can
reboot (fence) machines.
I had the unfortunate position to found 10 hypervisor in an unknown
state after the NFS volume became unresponsive while CMAN/CLVM and
fenced were running and everything went south. Because a reboot by
this script caused the hypervisor to fence and after over 50% of the
cluster nodes left, the cluster was in an state without quorum.
I personally can't see why anyone want a booting hypervisor,
especially if other storage pools like CLVM or local where serving
fine and would have increased the availability of those VMs.

Usually you fix your NFS,Storage or network problem and reboot the
affected VMs....

Thanks,
Bjoern

Mime
View raw message