cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nux! <...@li.nux.ro>
Subject Re: HA issues
Date Wed, 17 Jan 2018 09:12:20 GMT
Right, sorry for using the terms interchangeably, I see what you mean.

I'll do further testing then as VM HA was also not working in my setup.

I'll be back.

--
Sent from the Delta quadrant using Borg technology!

Nux!
www.nux.ro

----- Original Message -----
> From: "Rohit Yadav" <rohit.yadav@shapeblue.com>
> To: "dev" <dev@cloudstack.apache.org>
> Sent: Wednesday, 17 January, 2018 09:09:19
> Subject: Re: HA issues

> Hi Lucian,
> 
> 
> The "Host HA" feature is entirely different from VM HA, however, they may work
> in tandem, so please stop using the terms interchangeably as it may cause the
> community to believe a regression has been caused.
> 
> 
> The "Host HA" feature currently ships with only "Host HA" provider for KVM that
> is strictly tied to out-of-band management (IPMI for fencing, i.e power off and
> recovery, i.e. reboot) and NFS (as primary storage). (We also have a provider
> for simulator, but that's for coverage/testing purposes).
> 
> 
> Therefore, "Host HA" for KVM (+nfs) currently works only when OOBM is enabled.
> The frameowkr allows interested parties may write their own HA providers for a
> hypervisor that can use a different strategy/mechanism for fencing/recovery of
> hosts (including write a non-IPMI based OOBM plugin) and host/disk activity
> checker that is non-NFS based.
> 
> 
> The "Host HA" feature ships disabled by default and does not cause any
> interference with VM HA. However, when enabled and configured correctly, it is
> a known limitation that when it is unable to successfully perform recovery or
> fencing tasks it may not trigger VM HA. We can discuss how to handle such cases
> (thoughts?). "Host HA" would try couple of times to recover and failing to do
> so, it would eventually trigger a host fencing task. If it's unable to fence a
> host, it will indefinitely attempt to fence the host (the host state will be
> stuck at fencing state in cloud.ha_config table for example) and alerts will be
> sent to admin who can do some manual intervention to handle such situations (if
> you've email/smtp enabled, you should see alert emails).
> 
> 
> We can discuss how to improve and have a workaround for the case you've hit,
> thanks for sharing.
> 
> 
> - Rohit
> 
> ________________________________
> From: Nux! <nux@li.nux.ro>
> Sent: Tuesday, January 16, 2018 10:42:35 PM
> To: dev
> Subject: Re: HA issues
> 
> Ok, reinstalled and re-tested.
> 
> What I've learned:
> 
> - HA only works now if OOB is configured, the old way HA no longer applies -
> this can be good and bad, not everyone has IPMIs
> 
> - HA only works if IPMI is reachable. I've pulled the cord on a HV and HA failed
> to do its thing, leaving me with a HV down along with all the VMs running
> there. That's bad.
> I've opened this ticket for it:
> https://issues.apache.org/jira/browse/CLOUDSTACK-10234
> 
> Let me know if you need any extra info or stuff to test.
> 
> Regards,
> Lucian
> 
> --
> Sent from the Delta quadrant using Borg technology!
> 
> Nux!
> www.nux.ro
> 
> 
> rohit.yadav@shapeblue.com
> www.shapeblue.com
> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> @shapeblue
>  
> 
> 
> ----- Original Message -----
>> From: "Nux!" <nux@li.nux.ro>
>> To: "dev" <dev@cloudstack.apache.org>
>> Sent: Tuesday, 16 January, 2018 11:35:58
>> Subject: Re: HA issues
> 
>> I'll reinstall my setup and try again, just to be sure I'm working on a clean
>> slate.
>>
>> --
>> Sent from the Delta quadrant using Borg technology!
>>
>> Nux!
>> www.nux.ro
>>
>> ----- Original Message -----
>>> From: "Rohit Yadav" <rohit.yadav@shapeblue.com>
>>> To: "dev" <dev@cloudstack.apache.org>
>>> Sent: Tuesday, 16 January, 2018 11:29:51
>>> Subject: Re: HA issues
>>
>>> Hi Lucian,
>>>
>>>
>>> If you're talking about the new HostHA feature (with KVM+nfs+ipmi), please refer
>>> to following docs:
>>>
>>> http://docs.cloudstack.apache.org/projects/cloudstack-administration/en/latest/hosts.html#out-of-band-management
>>>
>>> https://cwiki.apache.org/confluence/display/CLOUDSTACK/Host+HA
>>>
>>>
>>> We'll need to you look at logs perhaps create a JIRA ticket with the logs and
>>> details? If you saw ipmi based reboot, then host-ha indeed tried to recover
>>> i.e. reboot the host, once hostha has done its work it would schedule HA for
VM
>>> as soon as the recovery operation succeeds (we've simulator and kvm based
>>> marvin tests for such scenarios).
>>>
>>>
>>> Can you see it making attempt to schedule VM ha in logs, or any failure?
>>>
>>>
>>> - Rohit
>>>
>>> <https://cloudstack.apache.org>
>>>
>>>
>>>
>>> ________________________________
>>> From: Nux! <nux@li.nux.ro>
>>> Sent: Tuesday, January 16, 2018 12:47:56 AM
>>> To: dev
>>> Subject: [4.11] HA issues
>>>
>>> Hi,
>>>
>>> I see there's a new HA engine for KVM and IPMI support which is really nice,
>>> however it seems hit and miss.
>>> I have created an instance with HA offering, kernel panicked one of the
>>> hypervisors - after a while the server was rebooted via IPMI probably, but the
>>> instance never moved to a running hypervisor and even after the original
>>> hypervisor came back it was still left in Stopped state.
>>> Is there any extra things I need to set up to have proper HA?
>>>
>>> Regards,
>>> Lucian
>>>
>>> --
>>> Sent from the Delta quadrant using Borg technology!
>>>
>>> Nux!
>>> www.nux.ro
>>>
>>> rohit.yadav@shapeblue.com
>>> www.shapeblue.com<http://www.shapeblue.com>
>>> 53 Chandos Place, Covent Garden, London  WC2N 4HSUK
> > > @shapeblue

Mime
View raw message