Return-Path: X-Original-To: apmail-cloudstack-dev-archive@www.apache.org Delivered-To: apmail-cloudstack-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4DFAD17640 for ; Mon, 4 May 2015 15:35:20 +0000 (UTC) Received: (qmail 26737 invoked by uid 500); 4 May 2015 15:35:19 -0000 Delivered-To: apmail-cloudstack-dev-archive@cloudstack.apache.org Received: (qmail 26693 invoked by uid 500); 4 May 2015 15:35:19 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 26681 invoked by uid 99); 4 May 2015 15:35:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 15:35:19 +0000 X-ASF-Spam-Status: No, hits=2.2 required=5.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: message received from 54.164.171.186 which is an MX secondary for dev@cloudstack.apache.org) Received: from [54.164.171.186] (HELO mx1-us-east.apache.org) (54.164.171.186) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 04 May 2015 15:35:13 +0000 Received: from mail-wg0-f52.google.com (mail-wg0-f52.google.com [74.125.82.52]) by mx1-us-east.apache.org (ASF Mail Server at mx1-us-east.apache.org) with ESMTPS id 653EA4546E for ; Mon, 4 May 2015 15:34:52 +0000 (UTC) Received: by wgyo15 with SMTP id o15so154182229wgy.2 for ; Mon, 04 May 2015 08:34:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :content-type; bh=FNXdgeVCVFk8UpicJYS0ZOybuS6cxmwHobk3UetMtcU=; b=YrhoRwGyhqWaFo+cCd+v+uUZ0WtKb1/wIFA+yAJOTLWzGjTuNwJoBVWURwoCwR6LHi WnUiUKuLl0P+D+YbIzojbIjM4KfpgkLWNlM//05h/iPlyl3HGUnR1UGOs/0TQFoEPAgX bb28l+NcKDYiextqkvB9obbnfetuijCmgTbj4eYrycbVYGCT8m3cKlpsEofgcEfW78Z3 XF1Ui4kv1G3IjxiK5XUY+lbcRZxN8QbwOfg/W5ralTr8CW/VAEeiNHFIb6Hk/D96S73l i2gJd+1GlCrdocuwoUJ9VTPgm8CNlSa9X7L8fvfzz9gsS45aUtM/Blvfsd/0RAoSmxcY DIRg== X-Received: by 10.194.60.43 with SMTP id e11mr43910662wjr.36.1430753646338; Mon, 04 May 2015 08:34:06 -0700 (PDT) MIME-Version: 1.0 Received: by 10.27.131.88 with HTTP; Mon, 4 May 2015 08:33:45 -0700 (PDT) In-Reply-To: References: From: Tim Mackey Date: Mon, 4 May 2015 11:33:45 -0400 Message-ID: Subject: Re: [DISCUSS] XenServer and HA: the way forward To: "dev@cloudstack.apache.org" Content-Type: multipart/alternative; boundary=047d7b86de86d9e8a50515434a5e X-Virus-Checked: Checked by ClamAV on apache.org --047d7b86de86d9e8a50515434a5e Content-Type: text/plain; charset=UTF-8 Thanks for starting this thread Remi. >From my perspective the pros of simply enabling XenServer HA are: - automatic election of pool master in the event of hardware failure - automatic fencing of a host in the event of dom0 corruption - automatic fencing of a host in the event of heartbeat failure The risks of simply enabling XenServer HA are: - additional code to detect a newly elected pool master - acceptance of the fact an admin can force a new pool master from XenServer CLI - requirement for pool size to be greater than 2 (pool size of 2 results in semi-deterministic fencing which isn't user obvious) - understanding that storage heartbeat can be shorter than storage timeout (aggressive fencing) - understanding that HA plans are computed even when no VMs are protected (performance decrease) One question we'll want to decide on is who is the primary actor when it comes to creating the pool since that will define the first pool master. During my demo build using 4.4 at CCCEU I expected to add pool members through the CS UI, but found that adding them in XenServer was required. This left me in an indeterminate state wrt pool members. I vote that if a host is added to CS and it *is* already a member of a pool, that the pool be imported as a cluster and any future membership changes happen using CS APIs. If a host is added which isn't a member of a pool, then the user be asked if they wish to add it to an existing cluster (and behind the scenes add it to a pool), or create a new cluster and add it to that cluster. This would be a change to the "add host" semantics. Once the host is added, we can enable XenServer HA on the pool if it satisfies the requirements for XenServer HA (has shared storage and three or more pool members). There are some details we'd want to take care of, but this flow makes sense to me, and we could use it even with upgrades. -tim On Mon, May 4, 2015 at 6:04 AM, Remi Bergsma wrote: > Hi all, > > Since CloudStack 4.4 the implementation of HA in CloudStack was changed to > use the XenHA feature of XenServer. As of 4.4, it is expected to have XenHA > enabled for the pool (not for the VMs!) and so XenServer will be the one to > elect a new pool master, whereas CloudStack did it before. Also, XenHA > takes care of fencing the box instead of CloudStack should storage be > unavailable. To be exact, they both try to fence but XenHA is usually > faster. > > To be 100% clear: HA on VMs is in all cases done by CloudStack. It's just > that without a pool master, no VMs will be recovered anyway. This brought > some headaches to me, as first of all I didn't know. We probably need to > document this somewhere. This is important, because without XenHA turned on > you'll not get a new pool master (a behaviour change). > > Personally, I don't like the fact that we have "two captains" in case > something goes wrong. But, some say they like this behaviour. I'm OK with > both, as long as one can choose whatever suits their needs best. > > In Austin I talked to several people about this. We came up with the idea > to have CloudStack check whether XenHA is on or not. If it is, it does the > current 4.4+ behaviour (XenHA selects new pool master). When it is not, we > do the CloudStack 4.3 behaviour where CloudStack is fully in control. > > I also talked to Tim Mackey and he wants to help implement this, but he > doesn't have much time. The idea is to have someone else join in to code > the change and then Tim will be able to help out on a regularly basis > should we need in depth knowledge of XenServer or its implementation in > CloudStack. > > Before we kick this off, I'd like to discuss and agree that this is the way > forward. Also, if you're interested in joining this effort let me know and > I'll kick it off. > > Regards, > Remi > --047d7b86de86d9e8a50515434a5e--