Return-Path: X-Original-To: apmail-cloudstack-users-archive@www.apache.org Delivered-To: apmail-cloudstack-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id F27791057B for ; Wed, 3 Jul 2013 18:38:48 +0000 (UTC) Received: (qmail 33116 invoked by uid 500); 3 Jul 2013 18:38:48 -0000 Delivered-To: apmail-cloudstack-users-archive@cloudstack.apache.org Received: (qmail 33041 invoked by uid 500); 3 Jul 2013 18:38:48 -0000 Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@cloudstack.apache.org Delivered-To: mailing list users@cloudstack.apache.org Received: (qmail 33033 invoked by uid 99); 3 Jul 2013 18:38:47 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 18:38:47 +0000 X-ASF-Spam-Status: No, hits=-0.7 required=5.0 tests=RCVD_IN_DNSWL_LOW,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: domain of geoff.higginbottom@shapeblue.com designates 216.32.180.187 as permitted sender) Received: from [216.32.180.187] (HELO co1outboundpool.messaging.microsoft.com) (216.32.180.187) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 03 Jul 2013 18:38:40 +0000 Received: from mail88-co1-R.bigfish.com (10.243.78.234) by CO1EHSOBE021.bigfish.com (10.243.66.84) with Microsoft SMTP Server id 14.1.225.22; Wed, 3 Jul 2013 18:38:18 +0000 Received: from mail88-co1 (localhost [127.0.0.1]) by mail88-co1-R.bigfish.com (Postfix) with ESMTP id C6B542033F for ; Wed, 3 Jul 2013 18:38:18 +0000 (UTC) X-Forefront-Antispam-Report: CIP:157.56.249.213;KIP:(null);UIP:(null);IPV:NLI;H:AM2PRD0710HT005.eurprd07.prod.outlook.com;RD:none;EFVD:NLI X-SpamScore: -1 X-BigFish: PS-1(zz542I55dI14ffIzz1f42h1ee6h1de0h1fdah2073h1202h1e76h1d1ah1d2ah1fc6hzz8275bh8275dhz2fh2a8h668h839h944hd24hf0ah1220h1288h12a5h12a9h12bdh137ah13b6h1441h1504h1537h153bh162dh1631h1758h18e1h1946h19b5h19ceh1ad9h1b0ah1d07h1d0ch1d2eh1d3fh1de9h1dfeh1dffh1e1dh9a9j1155h) Received-SPF: pass (mail88-co1: domain of shapeblue.com designates 157.56.249.213 as permitted sender) client-ip=157.56.249.213; envelope-from=geoff.higginbottom@shapeblue.com; helo=AM2PRD0710HT005.eurprd07.prod.outlook.com ;.outlook.com ; X-Forefront-Antispam-Report-Untrusted: SFV:NSPM;SFS:(199002)(189002)(13464003)(53754006)(13734003)(83072001)(63696002)(4396001)(80022001)(47736001)(76786001)(50986001)(65816001)(76796001)(74876001)(76482001)(56776001)(16406001)(47446002)(49866001)(54356001)(74316001)(54316002)(53806001)(47976001)(56816003)(76576001)(33646001)(66066001)(74502001)(31966008)(81342001)(46102001)(51856001)(59766001)(77096001)(74366001)(74706001)(77982001)(81542001)(69226001)(79102001)(74662001)(24736002);DIR:OUT;SFP:;SCL:1;SRVR:AMXPR07MB071;H:AMXPR07MB070.eurprd07.prod.outlook.com;RD:InfoNoRecords;A:3;MX:3;LANG:en; Received: from mail88-co1 (localhost.localdomain [127.0.0.1]) by mail88-co1 (MessageSwitch) id 1372876695873713_24445; Wed, 3 Jul 2013 18:38:15 +0000 (UTC) Received: from CO1EHSMHS017.bigfish.com (unknown [10.243.78.235]) by mail88-co1.bigfish.com (Postfix) with ESMTP id D2F42640052 for ; Wed, 3 Jul 2013 18:38:15 +0000 (UTC) Received: from AM2PRD0710HT005.eurprd07.prod.outlook.com (157.56.249.213) by CO1EHSMHS017.bigfish.com (10.243.66.27) with Microsoft SMTP Server (TLS) id 14.1.225.23; Wed, 3 Jul 2013 18:38:14 +0000 Received: from AMXPR07MB071.eurprd07.prod.outlook.com (10.242.70.154) by AM2PRD0710HT005.eurprd07.prod.outlook.com (10.255.165.40) with Microsoft SMTP Server (TLS) id 14.16.324.0; Wed, 3 Jul 2013 18:38:10 +0000 Received: from AMXPR07MB070.eurprd07.prod.outlook.com (10.242.70.148) by AMXPR07MB071.eurprd07.prod.outlook.com (10.242.70.154) with Microsoft SMTP Server (TLS) id 15.0.702.21; Wed, 3 Jul 2013 18:38:07 +0000 Received: from AMXPR07MB070.eurprd07.prod.outlook.com ([169.254.16.81]) by AMXPR07MB070.eurprd07.prod.outlook.com ([169.254.16.81]) with mapi id 15.00.0702.005; Wed, 3 Jul 2013 18:38:07 +0000 From: Geoff Higginbottom To: "users@cloudstack.apache.org" Subject: RE: Primary storage failure Thread-Topic: Primary storage failure Thread-Index: AQHOeBktVzMgkoKQIUCBRCgWk1edTplTRqmw Date: Wed, 3 Jul 2013 18:38:06 +0000 Message-ID: References: In-Reply-To: Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [80.229.158.60] x-forefront-prvs: 0896BFCE6C Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: shapeblue.com X-Virus-Checked: Checked by ClamAV on apache.org Dean, I am guessing you are using NFS for your Primary Storage. This is actually 'by design'. The logic is that if the storage goes offlin= e, then all VMs must have also failed, and a 'forced' reboot of the Host 'm= ight' automatically fix things. This is great if you only have one Primary Storage, but typically you have = more than one, so whilst the reboot might fix the failed storage, it will a= lso kill off all the perfectly good VMs which were still happily running. The fix for XenServer Hosts is to: 1. Modify /opt/xensource/bin/xenheartbeat.sh on all your Hosts, commenting = out the two entries which have "reboot -f" 2. Identify the PID of the script - pidof -x xenheartbeat.sh 3. Restart the Script - kill 4. Force reconnect Host from the UI, the script will then re-launch on rec= onnect If you running KVM, I'm guessing there is a similar script, but I have not = tried this yet for anything other than XenSever (it does not apply to ESXi) Regards Geoff Higginbottom D: +44 20 3603 0542 | S: +44 20 3603 0540 | M: +447968161581 geoff.higginbottom@shapeblue.com -----Original Message----- From: Dean Kamali [mailto:dean.kamali@gmail.com] Sent: 03 July 2013 19:14 To: users@cloudstack.apache.org Subject: Primary storage failure Hello everyone I'm testing failure scenarios, and I have noticed that as soon as the prima= ry storage gets offline. cloudstack management server seems to think that the hypervisor is not resp= onding and it will reboot the node, if you have number of of nodes it will = eventually reboot all of them. (losing everything .. fun! ) What if I have multiple primary storage and one of them failed? it will reb= oot all of my hypervisors? it doesn't seems right to me. Is there is a way to control this behavior? it seems that cloud stack management server needs to be a little smarter. This email and any attachments to it may be confidential and are intended s= olely for the use of the individual to whom it is addressed. Any views or o= pinions expressed are solely those of the author and do not necessarily rep= resent those of Shape Blue Ltd or related companies. If you are not the int= ended recipient of this email, you must neither take any action based upon = its contents, nor copy or show it to anyone. Please contact the sender if y= ou believe you have received this email in error. Shape Blue Ltd is a compa= ny incorporated in England & Wales. ShapeBlue Services India LLP is operate= d under license from Shape Blue Ltd. ShapeBlue is a registered trademark.