Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id A7A0F200D5F for ; Mon, 18 Dec 2017 10:20:32 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id A60C0160C05; Mon, 18 Dec 2017 09:20:32 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id EA216160BF9 for ; Mon, 18 Dec 2017 10:20:31 +0100 (CET) Received: (qmail 59389 invoked by uid 500); 18 Dec 2017 09:20:30 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 59367 invoked by uid 99); 18 Dec 2017 09:20:30 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Dec 2017 09:20:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 191ED18070F; Mon, 18 Dec 2017 09:20:30 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.401 X-Spam-Level: X-Spam-Status: No, score=-0.401 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-2.8, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd3-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id uFnfKLr37Y7F; Mon, 18 Dec 2017 09:20:28 +0000 (UTC) Received: from mail-oi0-f44.google.com (mail-oi0-f44.google.com [209.85.218.44]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 36DBA5F19D; Mon, 18 Dec 2017 09:20:27 +0000 (UTC) Received: by mail-oi0-f44.google.com with SMTP id y75so10147200oie.4; Mon, 18 Dec 2017 01:20:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=RJe/uFR9GKmbCR/WF2SriRj6AOtZ9ihb0lzAaYQsW6M=; b=iUUGZcm1lhKz56Zef6EkMpMcCjFOjnaLph99as7m7PFsBEibyrPUAY+LFGbfjhjcX+ OZwhKNn/R3eIBzUbac3XhbbgB8PMIBjnFtJsEbHc5tbdP/F2UN2uzNH16FM6JoQaicxM VNIJJaT6tx5mwpgxEp0mYGhBLQ/sGj9VjJSOgj7szC4YFaLS6VUj5q7jAwXTUybH0Mzl ah5TCPuBo7fqur5/qO2TXeDXNtS6Os5jSiBSwIvHr3E+ulLXR4raTuMJpiX5IqlhEh6n lXNWVmPVO0F8Ev3SzdvIGvYm1SQ5D9yMkwbAF+G8xDOU/0wBWElRGzzVsh6Db3TTJ78j fhFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=RJe/uFR9GKmbCR/WF2SriRj6AOtZ9ihb0lzAaYQsW6M=; b=n6wAIEYWN45MfxTi7G+nsH2ZeGoxbjF/rTkT5BYZzQFiq085FWhk7VhjBIUS5KT/D3 0xrnh6QnAfTt0+kFXWrrEM6x7j1MOi+GdXi4Gg6WbdXNgo7FXo1/rld8tlCiySu39QXn oxycEgVdJBDBU7IYc6zSHZAsWu91HTRjAIUjAQRfW0SWOgh1LVCttyO0h8B6c+NF2dn/ XZONK7zIfFx56tGmykqtje3XZNSkudZjzGdmuTFkkzYun+eMImjX8MdXep85lz5RMHix PXUEJzntbL8e7g61bZFo1Hbt/2UwuQsaaCcnw67MIfxo3JIb3Vg2owjD6KqC5Y1CYcEG Kw5A== X-Gm-Message-State: AKGB3mIBfLrbgSYGBy1n99MAZuRiwWZ5HTwbIU9cXpo8xcsWq4z/Ut1Z rxuh9hp+bBA1nsk8U8Oa4AQK0RqNTyXcVO5ZGeXqZQ== X-Google-Smtp-Source: ACJfBosvN/+r1oenEUf8oCH1kcabGUResLo6TPEUVsr5V/Y5nX5+TDaS1eGZbUGjECxU5boitqIllAUdf0Sh1DACRok= X-Received: by 10.202.1.201 with SMTP id 192mr13906249oib.97.1513588825689; Mon, 18 Dec 2017 01:20:25 -0800 (PST) MIME-Version: 1.0 Received: by 10.157.64.181 with HTTP; Mon, 18 Dec 2017 01:20:25 -0800 (PST) In-Reply-To: References: From: =?UTF-8?Q?Rafael_Weing=C3=A4rtner?= Date: Mon, 18 Dec 2017 07:20:25 -0200 Message-ID: Subject: Re: [Discuss] Management cluster / Zookeeper holding locks To: "users@cloudstack.apache.org" Cc: dev Content-Type: multipart/alternative; boundary="001a1138e9b44a961b056099dc0b" archived-at: Mon, 18 Dec 2017 09:20:32 -0000 --001a1138e9b44a961b056099dc0b Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable How hard is it to configure Zookeeper and get everything up and running? BTW: what zookeeper would be managing? CloudStack management servers or MySQL nodes? On Mon, Dec 18, 2017 at 7:13 AM, Ivan Kudryavtsev wrote: > Hello, Marc-Aurele, I strongly believe that all mysql locks should be > removed in favour of truly DLM solution like Zookeeper. The performance o= f > 3node ZK ensemble should be enough to hold up to 1000-2000 locks per seco= nd > and it helps to move to truly clustered MySQL like galera without single > master server. > > 2017-12-18 15:33 GMT+07:00 Marc-Aur=C3=A8le Brothier : > > > Hi everyone, > > > > I was wondering how many of you are running CloudStack with a cluster o= f > > management servers. I would think most of you, but it would be nice to > hear > > everyone voices. And do you get hosts going over their capacity limits? > > > > We discovered that during the VM allocation, if you get a lot of parall= el > > requests to create new VMs, most notably with large profiles, the > capacity > > increase is done too far after the host capacity checks and results in > > hosts going over their capacity limits. To detail the steps: the > deployment > > planner checks for cluster/host capacity and pick up one deployment pla= n > > (zone, cluster, host). The plan is stored in the database under a VMwor= k > > job and another thread picks that entry and starts the deployment, > > increasing the host capacity and sending the commands. Here there's a > time > > gap between the host being picked up and the capacity increase for that > > host of a couple of seconds, which is well enough to go over the capaci= ty > > on one or more hosts. A few VMwork job can be added in the DB queue > > targeting the same host before one gets picked up. > > > > To fix this issue, we're using Zookeeper to act as the multi JVM lock > > manager thanks to their curator library ( > > https://curator.apache.org/curator-recipes/shared-lock.html). We also > > changed the time when the capacity is increased, which occurs now prett= y > > much after the deployment plan is found and inside the zookeeper lock. > This > > ensure we don't go over the capacity of any host, and it has been prove= n > > efficient since a month in our management server cluster. > > > > This adds another potential requirement which should be discuss before > > proposing a PR. Today the code works seamlessly without ZK too, to ensu= re > > it's not a hard requirement, for example in a lab. > > > > Comments? > > > > Kind regards, > > Marc-Aur=C3=A8le > > > > > > -- > With best regards, Ivan Kudryavtsev > Bitworks Software, Ltd. > Cell: +7-923-414-1515 > WWW: http://bitworks.software/ > --=20 Rafael Weing=C3=A4rtner --001a1138e9b44a961b056099dc0b--