cloudstack-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Simon Weller (JIRA)" <>
Subject [jira] [Commented] (CLOUDSTACK-8943) KVM HA is broken, let's fix it
Date Mon, 12 Oct 2015 11:39:05 GMT


Simon Weller commented on CLOUDSTACK-8943:

Perhaps one of the easiest ways to deal with this would be to introduce IPMI functionality
into Cloudstack, so a KVM host could be fenced via an out-of-band IPMI interface. Upon successful
fencing, CS MGMT could mark the host as disabled. I know deleting a host is enough to force
CS MGMT to attempt to restart affected VMs on other hosts, but I'm not sure whether disabling
a host will at this point in time.

There are other considerations that will need to be made as well, especially around storage
locking (e.g. CEPH).

> KVM HA is broken, let's fix it
> ------------------------------
>                 Key: CLOUDSTACK-8943
>                 URL:
>             Project: CloudStack
>          Issue Type: Bug
>      Security Level: Public(Anyone can view this level - this is the default.) 
>         Environment: Linux distros with KVM/libvirt
>            Reporter: Nux
> Currently KVM HA works by monitoring an NFS based heartbeat file and it can often fail
whenever this network share becomes slower, causing the hypervisors to reboot.
> This can be particularly annoying when you have different kinds of primary storages in
place which are working fine (people running CEPH etc).
> Having to wait for the affected HV which triggered this to come back and declare it's
not running VMs is a bad idea; this HV could require hours or days of maintenance!
> This is embarrassing. How can we fix it? Ideas, suggestions? How are other hypervisors
doing it?
> Let's discuss, test, implement. :)

This message was sent by Atlassian JIRA

View raw message