cloudstack-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ilya musayev <ilya.mailing.li...@gmail.com>
Subject Re: KVM+HA
Date Tue, 18 Jul 2017 13:18:17 GMT
Apology for fragmented messages, in existing framework cloudstack does not
know for certain if your VMs are dead, or KVM hypervisor crashed, or its
just a network blip, or perhaps you stopped kvm agent (or agent died). It
takes a conservative approach and does not re-start the VMs on other
hypervisors to avoid split brain scenario.

The only time it will restart KVM hypervisor and move VMs over - is when
you loose a primary storage access to one of the hypervisors in the cluster
- using NFS heartbeat method i mentioned earlier.

New framework addresses the limitations above by
1) checking if there is any disk activity on VMs that are in uncertain
state - if no activity for ALL VMs for "x" number of seconds
2) cloudstack will issue IPMI fence command to power down/reboot a host
(via ILO or DRAC or something else similar)
3) the VMs will be restarted elsewhere

Regards
ilya

On Tue, Jul 18, 2017 at 6:10 AM, ilya musayev <ilya.mailing.lists@gmail.com>
wrote:

> What share primary storage backend do you have for your VMs?
>
> If it is NFS - cloudstack agent writes heartbeat. When issue occurs - the
> neighbor hosts will check if the hypervisor that failed - still writes to
> heartbeat file. There are bunch of corner case where cloudstack HA does not
> kick in - due to uncertainty.
>
> The new framework should address those uncertainties.
>
> KVM HA with IPMI Fencing - Apache Cloudstack - Apache Software ...
> <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&cad=rja&uact=8&ved=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg2MAA&url=https%3A%2F%2Fcwiki.apache.org%2Fconfluence%2Fdisplay%2FCLOUDSTACK%2FKVM%2BHA%2Bwith%2BIPMI%2BFencing&usg=AFQjCNG_-JHCYhcZm0lM9M4gKM4vKQ3hew>
> [CLOUDSTACK-8943] KVM HA is broken, let's fix it - ASF JIRA
> <https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=2&cad=rja&uact=8&ved=0ahUKEwi59uv58pLVAhXHslQKHSU_B5YQFgg9MAE&url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FCLOUDSTACK-8943&usg=AFQjCNGkOyC0hR4otCJ1LZF4j-2HSayMyQ>
>
> Regards
> ilya
>
> On Tue, Jul 18, 2017 at 6:06 AM, ilya musayev <
> ilya.mailing.lists@gmail.com> wrote:
>
>> Hi Victor
>>
>> We recently rewrote KVM HA framework. Its being merged into latest build.
>>
>>
>> On Tue, Jul 18, 2017 at 5:39 AM, victor <victor@ihnetworks.com> wrote:
>>
>>> Hello Guys,
>>>
>>> I am facing the same issue that mentioned in the following url .
>>>
>>> -----------------
>>>
>>> https://issues.apache.org/jira/browse/CLOUDSTACK-3535
>>>
>>> -------------
>>>
>>> When the host is put in maintenance mode , then ha enabled VM's are
>>> automatically migrated to available host. But when the kvm host is down, no
>>> HA is done. The vm's are still down until I put the host node back up.
>>>
>>>
>>> I have tried everything like the following.
>>>
>>> =====
>>>
>>> 1, system VM's  and client vm's are created in shared storage
>>>
>>> 3, Added ha.tag host tags
>>>
>>> 2, Created host by adding ha tag
>>>
>>> 3, Created VE's  in Ha enabled host with ha enabled service offering
>>>
>>> ====
>>>
>>> Do you guys have successfully tested Ha. I am really stuck at this part.
>>>
>>> Regards
>>>
>>>
>>>
>>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message