From users-return-26760-apmail-cloudstack-users-archive=cloudstack.apache.org@cloudstack.apache.org Thu Oct 13 15:31:54 2016 Return-Path: X-Original-To: apmail-cloudstack-users-archive@www.apache.org Delivered-To: apmail-cloudstack-users-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 3CE9D19B8A for ; Thu, 13 Oct 2016 15:31:54 +0000 (UTC) Received: (qmail 37448 invoked by uid 500); 13 Oct 2016 15:31:53 -0000 Delivered-To: apmail-cloudstack-users-archive@cloudstack.apache.org Received: (qmail 37396 invoked by uid 500); 13 Oct 2016 15:31:53 -0000 Mailing-List: contact users-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@cloudstack.apache.org Delivered-To: mailing list users@cloudstack.apache.org Received: (qmail 37341 invoked by uid 99); 13 Oct 2016 15:31:50 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 13 Oct 2016 15:31:50 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id A84061A0165 for ; Thu, 13 Oct 2016 15:31:49 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -3.1 X-Spam-Level: X-Spam-Status: No, score=-3.1 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RP_MATCHES_RCVD=-2.999, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=secretresearchfacility.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id lAbBujgJE6kN for ; Thu, 13 Oct 2016 15:31:47 +0000 (UTC) Received: from srf.secretresearchfacility.com (srf.secretresearchfacility.com [80.241.60.3]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id E5ACF5FAC8 for ; Thu, 13 Oct 2016 15:31:45 +0000 (UTC) Received: from localhost (localhost [IPv6:::1]) by srf.secretresearchfacility.com (Postfix) with ESMTP id 2D97F6E8E4A for ; Thu, 13 Oct 2016 17:31:37 +0200 (CEST) Received: from srf.secretresearchfacility.com ([IPv6:::1]) by localhost (srf.secretresearchfacility.com [IPv6:::1]) (amavisd-new, port 10032) with ESMTP id iVaoCl6kj_yg for ; Thu, 13 Oct 2016 17:31:37 +0200 (CEST) Received: from localhost (localhost [IPv6:::1]) by srf.secretresearchfacility.com (Postfix) with ESMTP id E27C56E8E60 for ; Thu, 13 Oct 2016 17:31:36 +0200 (CEST) DKIM-Filter: OpenDKIM Filter v2.8.0 srf.secretresearchfacility.com E27C56E8E60 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=secretresearchfacility.com; s=0951BA80-03F6-11E2-8407-76ED25AAF112; t=1476372696; bh=RRX69Cs0o2gTSY4Vq8LEixT4s93U/NLjL5U8b7hxW28=; h=Message-ID:Subject:From:To:Date:Content-Type:Mime-Version: Content-Transfer-Encoding; b=EEiI73omvJzv3g7rjyEj3BgFcjxwLq4h512/dh823WtI6/lKPnyVA5dmPEHois8N9 vBCBOUpMHHThRYwBI0yx4A/RLFeagRf/Rp/QAFlhcq5h2B5Gs6mRh+Gi4kZLHGz0FJ wEJkIZMrsYAjn63IWFnrx0NpcxtQ4fYR0Z247/uU= X-Virus-Scanned: amavisd-new at secretresearchfacility.com Received: from srf.secretresearchfacility.com ([IPv6:::1]) by localhost (srf.secretresearchfacility.com [IPv6:::1]) (amavisd-new, port 10026) with ESMTP id bPaFwfBT7Rev for ; Thu, 13 Oct 2016 17:31:36 +0200 (CEST) Received: from sseitz (p549FCBB4.dip0.t-ipconnect.de [84.159.203.180]) by srf.secretresearchfacility.com (Postfix) with ESMTPSA id 6F1FF6E8E4A for ; Thu, 13 Oct 2016 17:31:36 +0200 (CEST) Message-ID: <1476372677.5744.55.camel@secretresearchfacility.com> Subject: Re: Long downtimes for VMs through automatically triggered storage migration From: Stephan Seitz To: users@cloudstack.apache.org Date: Thu, 13 Oct 2016 17:31:17 +0200 In-Reply-To: References: <3bf570f1-6cbb-7fa8-d604-7ebc6aabed89@heinlein-support.de> ,<62d826fe-641d-a966-9be4-6473ff2be27f@heinlein-support.de> Organization: private Content-Type: text/plain; charset="ISO-8859-15" X-Mailer: Evolution 3.18.5.2-0ubuntu3.1 Mime-Version: 1.0 Content-Transfer-Encoding: quoted-printable > >=20 > > What we are still thinking about is the point, if it is principally > > a > > good idea to limit CloudStack in its ability to freely and > > automatically > > migrate VMs between all cluster nodes. Is setting > > "enable.ha.storage.migration"=3Dfalse the intended way to handle a > > setup > > with multiple clusters or is it kind of a dirty hack to circumvent > > disadvantages of our setup? In the latter case we would like to > > know > > that and keep a focus on alternatives and be ready to improve our > > setup > > in the mid-term. > The logic around having multiple primary storage options tied to > clusters is really designed to limit failure domains. Ideally you > want to spread your workloads across different failure domains so > that if you do lose a primary storage system, you still have services > up and running. > We build redundancy into the cluster and the storage attached to the > cluster. We also run multiple clusters within a pod. If you spread > your redundant VMs across multiple clusters (with their own primary > storage), it's easier to absorb a catastrophic storage failure, as > your eggs aren't in one basket. Thank you for this explanation. > We turn off HA storage migration, as it doesn't make much sense to > us. It assumes the storage is still up, as you obviously can't > migrate a VM to a different primary storage if it's down. If you have > enough hosts in a cluster, you should never run into a situation > where you can't bring all your VMs back up due to host failure. So in > that sense, HA storage migration is a pointless feature if you build > and scale your clusters properly. Indeed, even if you have enough hosts, we ran into that situation due to a bug with multiple datadisks and hvm introduced via=A0https://github. com/apache/cloudstack/pull/792=A0. As a result acs tries to start the vm on node1,2,3... and so on and fails on all hosts due to the underlying qemu-dm parsing error. Finally it tries to start on nodes of another cluster which subsequently triggers a storage migration. - Stephan