Return-Path: X-Original-To: apmail-cloudstack-users-archive@www.apache.org Delivered-To: apmail-cloudstack-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C9EFD18029 for ; Mon, 12 Oct 2015 07:55:10 +0000 (UTC) Received: (qmail 61295 invoked by uid 500); 12 Oct 2015 07:55:09 -0000 Delivered-To: apmail-cloudstack-users-archive@cloudstack.apache.org Received: (qmail 61240 invoked by uid 500); 12 Oct 2015 07:55:09 -0000 Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@cloudstack.apache.org Delivered-To: mailing list users@cloudstack.apache.org Received: (qmail 61190 invoked by uid 99); 12 Oct 2015 07:55:09 -0000 Received: from Unknown (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 12 Oct 2015 07:55:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B384FC0E31; Mon, 12 Oct 2015 07:55:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.109 X-Spam-Level: X-Spam-Status: No, score=-0.109 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, T_RP_MATCHES_RCVD=-0.01, URIBL_BLOCKED=0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=openminds.be Received: from mx1-eu-west.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id Khj0dxiFP2fi; Mon, 12 Oct 2015 07:54:58 +0000 (UTC) Received: from zimbra-om-001.openminds.be (zimbra-om-001.openminds.be [188.93.102.106]) by mx1-eu-west.apache.org (ASF Mail Server at mx1-eu-west.apache.org) with ESMTPS id DBDB820C18; Mon, 12 Oct 2015 07:54:57 +0000 (UTC) Received: from zimbra-om-001.openminds.be (localhost.localdomain [127.0.0.1]) by zimbra-om-001.openminds.be (Postfix) with ESMTPS id 8122F84D4D3; Mon, 12 Oct 2015 09:54:59 +0200 (CEST) Received: from localhost (localhost.localdomain [127.0.0.1]) by zimbra-om-001.openminds.be (Postfix) with ESMTP id 6B7B084D4D1; Mon, 12 Oct 2015 09:54:59 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.9.2 zimbra-om-001.openminds.be 6B7B084D4D1 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=openminds.be; s=8410EF12-6E3E-11E3-B1CE-64217EF0CCC6; t=1444636499; bh=/TIlasT1j8Urkzn1PMTFxCB2YnBFN4Oi8r53PlWDcYk=; h=Content-Type:Mime-Version:Subject:From:Date: Content-Transfer-Encoding:Message-Id:To; b=LzpilMAUDQnq5MwHGSh2gjsQn3a+YVDlChJPgUVkIhkR/y2G0DPvWmOY3Tp25CFRa CmmMcz+/RRUWjafOtNLq055KHsUAv5cHTFfOk8WhLFfChtDj+sfQ30U+oq6IN3PuAC xNONt8st00+z33so8Y4ZeA+LbaFiEgKz3+welM7I= Received: from zimbra-om-001.openminds.be ([127.0.0.1]) by localhost (zimbra-om-001.openminds.be [127.0.0.1]) (amavisd-new, port 10026) with ESMTP id FW5Kjlv5TxtD; Mon, 12 Oct 2015 09:54:59 +0200 (CEST) Received: from [10.1.11.140] (office.loft169.be [88.151.240.193]) by zimbra-om-001.openminds.be (Postfix) with ESMTPSA id 29CFE84D4CF; Mon, 12 Oct 2015 09:54:59 +0200 (CEST) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 9.0 \(3094\)) Subject: Re: KVM HA is broken, let's fix it From: Frank Louwers In-Reply-To: <1739AED7-F5B3-4D6E-87C7-101844D10E09@schubergphilis.com> Date: Mon, 12 Oct 2015 09:54:54 +0200 Cc: "dev@cloudstack.apache.org" Content-Transfer-Encoding: quoted-printable Message-Id: References: <108236900.12374.1444432732520.JavaMail.zimbra@li.nux.ro> <1739AED7-F5B3-4D6E-87C7-101844D10E09@schubergphilis.com> To: users@cloudstack.apache.org X-Mailer: Apple Mail (2.3094) > On 10 Oct 2015, at 12:35, Remi Bergsma = wrote: >=20 > Can you please explain what the issue is with KVM HA? In my tests, HA = starts all VMs just fine without the hypervisor coming back. At least = that is on current 4.6. Assuming a cluster of multiple nodes of course. = It will then do a neighbor check from another host in the same cluster.=20= >=20 > Also, malfunctioning NFS leads to corruption and therefore we fence a = box when the shared storage is unreliable. Combining primary and = secondary NFS is not a good idea for production in my opinion.=20 Well, it depends how you look at it, and what your situation is. If you use 1 NFS export als primary storage (and only NFS), then yes, = the system works as one would expect, and doesn=E2=80=99t need to be = fixed. However, HA is =E2=80=9Cnot functioning=E2=80=9D in any of these = scenario=E2=80=99s: - you don=E2=80=99t use NFS as your only primary storage - you use more than one NFS primary storage Even worse: imagine you only use local storage as primary storage, but = have 1 NFS configured (as the UI =E2=80=9Cwizard=E2=80=9D forces you to = configure one). You don=E2=80=99t have any active VM configured on the = primary storage. You then perform maintenance on the NFS storage, and = take it offline=E2=80=A6 All your hosts will then reboot, resulting in major downtime, that=E2=80=99= s completely unnecessary. There=E2=80=99s not even an option to disable = this at this point=E2=80=A6 We=E2=80=99ve removed the reboot = instructions from the HA script on all our instances=E2=80=A6 Regards, Frank=