Return-Path: X-Original-To: apmail-cloudstack-users-archive@www.apache.org Delivered-To: apmail-cloudstack-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3C0FB10A2D for ; Sun, 15 Feb 2015 10:58:19 +0000 (UTC) Received: (qmail 60990 invoked by uid 500); 15 Feb 2015 10:47:11 -0000 Delivered-To: apmail-cloudstack-users-archive@cloudstack.apache.org Received: (qmail 31674 invoked by uid 500); 15 Feb 2015 10:46:51 -0000 Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@cloudstack.apache.org Delivered-To: mailing list users@cloudstack.apache.org Received: (qmail 88259 invoked by uid 99); 15 Feb 2015 04:00:35 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Feb 2015 04:00:35 +0000 X-ASF-Spam-Status: No, hits=1.1 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS,URIBL_GREY X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of yzhang@marketo.com designates 173.203.187.101 as permitted sender) Received: from [173.203.187.101] (HELO smtp101.iad3a.emailsrvr.com) (173.203.187.101) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 15 Feb 2015 04:00:10 +0000 Received: from smtp5.relay.iad3a.emailsrvr.com (localhost.localdomain [127.0.0.1]) by smtp5.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTP id CF17D802A0 for ; Sat, 14 Feb 2015 23:00:08 -0500 (EST) Received: from smtp192.mex07a.mlsrvr.com (unknown [67.192.133.128]) by smtp5.relay.iad3a.emailsrvr.com (SMTP Server) with ESMTPS id AD2E88037E for ; Sat, 14 Feb 2015 23:00:08 -0500 (EST) X-Sender-Id: yzhang@marketo.com Received: from smtp192.mex07a.mlsrvr.com ([UNAVAILABLE]. [67.192.133.128]) (using TLSv1 with cipher AES128-SHA) by 0.0.0.0:25 (trex/5.4.2); Sun, 15 Feb 2015 04:00:08 GMT Received: from DFW1MBX17.mex07a.mlsrvr.com ([169.254.2.63]) by DFW1HUB17.mex07a.mlsrvr.com ([fe80::222:19ff:fe81:7d77%13]) with mapi; Sat, 14 Feb 2015 22:00:08 -0600 From: Yiping Zhang To: "users@cloudstack.apache.org" Date: Sat, 14 Feb 2015 22:00:05 -0600 Subject: Re: Cloudstack + XenServer 6.2 + NetApp in production Thread-Topic: Cloudstack + XenServer 6.2 + NetApp in production Thread-Index: AdBI0+LtzCSXLy4zR8aH6NAT+Hjtbw== Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: user-agent: Microsoft-MacOutlook/14.2.3.120616 acceptlanguage: en-US Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Tim, Thanks, for the reply. In our case, the NetApp cluster as a whole did not fail. The NetApp cluster failover was happening because Operations team was performing a scheduled maintenance, this is normal behavior. To best of my knowledge, NetApp head failover should take anywhere 10-15 seconds. As you guessed correctly, our XenServer resource pool does have HA enabled, and HA shared SR is indeed on the same NetApp cluster as the primary storage SR. Though I am not sure if enabling xen pool HA is the cause of xenserver=B9s rebooting under this particular scenario. I am not sure if I understand your statement that "In that case, HA would detect the storage failure and fence the XenServer host=B2. Can you elaborate a little more on this statement? Thanks again, Yiping On 2/14/15, 6:26 AM, "Tim Mackey" wrote: >Yiping, > >The specific problem covered by that note was solved a long time ago. >Timeouts can be caused by a number of things, and if the entire NetApp >cluster went offline, the XenServer host would be impacted. Since you are >experiencing a host reboot when this happens, I suspect you have XenServer >HA enabled with the heartbeat on the same NetApp cluster. In that case, >HA >would detect the storage failure and fence the XenServer host. > >The solution here would be to understand why your NetApp cluster failed >during scheduled maintenance. Something in your configuration has created >a >single point of failure. If you've enabled HA, I also would like to >understand why you've chosen to do that. Going slightly commercial for a >second, I would also advise you to look into a commercial support contract >for your production XenServer hosts. That team is going to be able to go >deeper, and much quicker, when production issues arise than this list. >NetApp and XenServer is used in a very large number of deployments, so if >there is something wrong they'll be more likely to know. For example, >there >could be a set of XenServer or OnTap patches to help sort this out. > >-tim > >On Fri, Feb 13, 2015 at 7:36 PM, Yiping Zhang wrote: > >> Hi, all: >> >> I am wondering if any one is running their CloudStack in production >> deployments with XenServer 6.2 + NetApp clusters ? >> >> Recently, in our non production deployment (rhel 6.6 + CS 4.3.0 + >> XenServer 6.2 cluster + NetApp cluster), all our XenServer rebooted >> automatically because of NFS timeout, when our NetApp cluster failover >> happened during a scheduled filer maintenance. My google search turned >>up >> this Citrix hot fix: http://support.citrix.com/article/CTX135623 for >> XenServer 6.0.2, and this post about XenServer 6.2: >> http://www.gossamer-threads.com/lists/xen/devel/320020 . >> >> Obviously the problem still exists for XenServer 6.2 and we are very >> concerned about going to production deployment based on this technology >> stack. >> >> If anyone has a similar setup, please share your experiences. >> >> Thanks, >> >> Yiping >> >> >>