cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rohit Yadav <>
Subject Re: [DISCUSS][FS] Host HA for CloudStack
Date Tue, 21 Feb 2017 05:57:31 GMT
Hi David,

Thank you for your questions.

As per the FS, there is a HA framework implementation that is agnostic of the resource and
is not tied to how the HA is performed separating policy from mechanism. The task of fencing
is implemented by a HA provider which is implementation specific.

The first version will include a HA provider for KVM (with NFS backed primary storage) in
which we've chosen to put the host into maintenance mode when it is fenced (by oobm/ipmi)
and the admin is required to manually put them back to the pool (i.e. remove from maintenance
mode) because doing this automatically may have side-effects. Also, by having the HA framework
separated from the hypervisor/storage specific logic anyone is free to implement their own
HA provider with custom logic, options and algorithms (as a plugin).

We can start by getting the HA framework and some initial HA provider (driver implementations)
reviewed and accepted, and over time support for other hypervisor and storage options such
as Ceph can be added.


From: David Mabry <>
Sent: 18 February 2017 03:40
Subject: Re: [DISCUSS][FS] Host HA for CloudStack


First, thanks for all the work you have put into this.  This is something that CS has sorely
needed for a long time.

A couple of items:

1.) You state the following:
“Before invoking the HA provider’s fence operation, the HA resource management will place
the resource in maintenance mode. The intention is to require an administrator to manually
verify that a resource is ready to return service by requiring an administrator to take it
out of maintenance mode.”
I agree that putting a host in maintenance mode to require manual intervention in order to
bring it back online is ideal and honestly how I would probably prefer to do it.  However,
I also like to give the end user/operator choice.  Perhaps we could add an option to bring
the Host out of Maintenance mode automatically if it passes all checks and comes back into
an ELIGIBLE state.  This way, if the operator chooses, the host could come back into full
operation and start recovering VMs if needed.  This could also be handy if your environment
isn’t quite n+1 when it comes to host capacity and you need to have the host back up and
running as soon as possible to minimize the outage duration.  Again, I know it isn’t ideal,
but I don’t see the harm in giving the operator the choice.

2.) You state the following:
“For the initial release, only KVM with NFS storage will be supported. However, the storage
check component will be implemented in a modular fashion allowing for checks using other storage
platforms(e.g. Ceph) in the future. HA provider plugins can be implemented for other hypervisors.”
We are using KVM with a Ceph backend and would be very interested in helping make it a part
of the initial push for this feature.  I have a Dev environment backed by Ceph that we could
use for teseting and would be willing to help with the development of the Ceph activity checks.

I’m looking forward to getting this feature added to CS.  Again, great job putting this
together and starting the conversation.

53 Chandos Place, Covent Garden, London  WC2N 4HSUK

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message