Apology for fragmented messages, in existing framework cloudstack does not know for certain if your VMs are dead, or KVM hypervisor crashed, or its just a network blip, or perhaps you stopped kvm agent (or agent died). It takes a conservative approach and does not re-start the VMs on other hypervisors to avoid split brain scenario. The only time it will restart KVM hypervisor and move VMs over - is when you loose a primary storage access to one of the hypervisors in the cluster - using NFS heartbeat method i mentioned earlier. New framework addresses the limitations above by 1) checking if there is any disk activity on VMs that are in uncertain state - if no activity for ALL VMs for "x" number of seconds 2) cloudstack will issue IPMI fence command to power down/reboot a host (via ILO or DRAC or something else similar) 3) the VMs will be restarted elsewhere Regards ilya On Tue, Jul 18, 2017 at 6:10 AM, ilya musayev wrote: > What share primary storage backend do you have for your VMs? > > If it is NFS - cloudstack agent writes heartbeat. When issue occurs - the > neighbor hosts will check if the hypervisor that failed - still writes to > heartbeat file. There are bunch of corner case where cloudstack HA does not > kick in - due to uncertainty. > > The new framework should address those uncertainties. > > KVM HA with IPMI Fencing - Apache Cloudstack - Apache Software ... > > [CLOUDSTACK-8943] KVM HA is broken, let's fix it - ASF JIRA > > > Regards > ilya > > On Tue, Jul 18, 2017 at 6:06 AM, ilya musayev < > ilya.mailing.lists@gmail.com> wrote: > >> Hi Victor >> >> We recently rewrote KVM HA framework. Its being merged into latest build. >> >> >> On Tue, Jul 18, 2017 at 5:39 AM, victor wrote: >> >>> Hello Guys, >>> >>> I am facing the same issue that mentioned in the following url . >>> >>> ----------------- >>> >>> https://issues.apache.org/jira/browse/CLOUDSTACK-3535 >>> >>> ------------- >>> >>> When the host is put in maintenance mode , then ha enabled VM's are >>> automatically migrated to available host. But when the kvm host is down, no >>> HA is done. The vm's are still down until I put the host node back up. >>> >>> >>> I have tried everything like the following. >>> >>> ===== >>> >>> 1, system VM's and client vm's are created in shared storage >>> >>> 3, Added ha.tag host tags >>> >>> 2, Created host by adding ha tag >>> >>> 3, Created VE's in Ha enabled host with ha enabled service offering >>> >>> ==== >>> >>> Do you guys have successfully tested Ha. I am really stuck at this part. >>> >>> Regards >>> >>> >>> >>> >> >