cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Remi Bergsma <r...@remi.nl>
Subject Re: [DISCUSS] XenServer and HA: the way forward
Date Thu, 28 May 2015 11:53:01 GMT
Hi Stephen,

Thanks for getting in touch!

First of all, good to hear we agree that this change wasn't done in a nice
way. For now, I think we should focus on documenting this change properly
so folks are aware they need to change their setup. I wrote some
documentation already that I'll port to 4.4 and 4.5 as well. Apart from
that, it's not only "turning on HA" since many operations will not work as
expected with HA on.

As for turning on HA by CloudStack, this is not as easy as it sounds. For
HA to be enabled needs to have a SR (storage repository) that the heartbeat
can be written to. We use NFS so we reused that, but if one happens to use
iSCSI or Fibre Channel then you need a new SR to create. We could add
warnings indeed, when HA is turned off. Then there is the case with older
versions that require licenses for HA to work. How do you suggest we handle
those?

Are there any other things I missed? So far this has been a big "reverse
engineering" experience so it'd be nice if you guys could confirm we've got
it all covered now. Also, it'd appreciate it if you could share some info
on the problems that you encountered so we can learn from it and understand
it better.

Final question: can you please take care of documenting the timeout setting
in the XenServer docs? It's a shame it is undocumented.

Looking forward to work together and make this work well again!

Regards,
Remi


2015-05-27 11:45 GMT+02:00 Stephen Turner <Stephen.Turner@citrix.com>:

> I'm sorry to come late to this thread, but I only picked it up from Remi's
> blog post [*] over the weekend.
>
> I'm certainly not going to defend the way this change came in under the
> radar, but speaking as a member of the XenServer development team, I
> wouldn't want to go back to the old behaviour. The risk is not just
> theoretical: we had at least one customer with serious data corruption
> problems as a result of the bad interaction between the CloudStack code and
> XenServer. I wonder if there's an alternative possibility where CloudStack
> makes sure that XenServer HA is turned on, and turns it on itself / gives
> you warnings if it isn't / something?
>
> --
> Stephen Turner
>
> [*]
> http://blog.remibergsma.com/2015/05/23/making-xenserver-and-cloudstack-sing-and-dance-together-again/
>
>
>
> -----Original Message-----
> From: Remi Bergsma [mailto:remi@remi.nl]
> Sent: 04 May 2015 11:04
> To: dev@cloudstack.apache.org
> Subject: [DISCUSS] XenServer and HA: the way forward
>
> Hi all,
>
> Since CloudStack 4.4 the implementation of HA in CloudStack was changed to
> use the XenHA feature of XenServer. As of 4.4, it is expected to have XenHA
> enabled for the pool (not for the VMs!) and so XenServer will be the one to
> elect a new pool master, whereas CloudStack did it before. Also, XenHA
> takes care of fencing the box instead of CloudStack should storage be
> unavailable. To be exact, they both try to fence but XenHA is usually
> faster.
>
> To be 100% clear: HA on VMs is in all cases done by CloudStack. It's just
> that without a pool master, no VMs will be recovered anyway. This brought
> some headaches to me, as first of all I didn't know. We probably need to
> document this somewhere. This is important, because without XenHA turned on
> you'll not get a new pool master (a behaviour change).
>
> Personally, I don't like the fact that we have "two captains" in case
> something goes wrong. But, some say they like this behaviour. I'm OK with
> both, as long as one can choose whatever suits their needs best.
>
> In Austin I talked to several people about this. We came up with the idea
> to have CloudStack check whether XenHA is on or not. If it is, it does the
> current 4.4+ behaviour (XenHA selects new pool master). When it is not, we
> do the CloudStack 4.3 behaviour where CloudStack is fully in control.
>
> I also talked to Tim Mackey and he wants to help implement this, but he
> doesn't have much time. The idea is to have someone else join in to code
> the change and then Tim will be able to help out on a regularly basis
> should we need in depth knowledge of XenServer or its implementation in
> CloudStack.
>
> Before we kick this off, I'd like to discuss and agree that this is the
> way forward. Also, if you're interested in joining this effort let me know
> and I'll kick it off.
>
> Regards,
> Remi
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message