Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 02329200D5F for ; Mon, 18 Dec 2017 10:48:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 00C60160C2A; Mon, 18 Dec 2017 09:48:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 2AFFD160BF9 for ; Mon, 18 Dec 2017 10:48:03 +0100 (CET) Received: (qmail 23638 invoked by uid 500); 18 Dec 2017 09:48:02 -0000 Mailing-List: contact dev-help@cloudstack.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@cloudstack.apache.org Delivered-To: mailing list dev@cloudstack.apache.org Received: (qmail 23625 invoked by uid 99); 18 Dec 2017 09:48:01 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 18 Dec 2017 09:48:01 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 35D28C3BB7 for ; Mon, 18 Dec 2017 09:48:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.979 X-Spam-Level: * X-Spam-Status: No, score=1.979 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=exoscale-ch.20150623.gappssmtp.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id mBIxQfRSinf3 for ; Mon, 18 Dec 2017 09:47:59 +0000 (UTC) Received: from mail-qk0-f195.google.com (mail-qk0-f195.google.com [209.85.220.195]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id CE17D5F473 for ; Mon, 18 Dec 2017 09:47:58 +0000 (UTC) Received: by mail-qk0-f195.google.com with SMTP id j207so17569326qke.10 for ; Mon, 18 Dec 2017 01:47:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=exoscale-ch.20150623.gappssmtp.com; s=20150623; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=ISsbOObVWD03BNACRv4iPu90jfI3rT8YOkU1l8aNlQA=; b=ozzd4MMoDrPCg9ulVtF7hlF03ldfaxNfgo+IkXV897CG27taDnWPrN+RUy2vHoNuvt QAJ0O09r/ZpLCOknEKrCdgM+weBiq1qLPaA0rSZWCl1bCRFP2UQC2tLq/+yLg6YokGAk 6BNRBgL59X+vXvnMBAXuWDQqqr0awei+++dZ67ibLrE0QeLNZ75a1C1Ys26P3hsYVWz6 qsyuQpmqovXp1tCA3p46xo8CWFCwjSZKgR9jbPF2wt67YmdYksyQnuxBdeXfBpJSkRQH KQsPUhuJAcUIZEzbO0Sv7XTF0t51oIircF5Gb8kYNh+Wu7OaM3cZDgv4PqnTKTHMYvG3 WkRQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=ISsbOObVWD03BNACRv4iPu90jfI3rT8YOkU1l8aNlQA=; b=cmqYDvEnYfCKqVoBltmVDXlBPGp6aECX0Jsw7FEkv7HdWjbeVHfU6S8OZcsbP7FMX7 2rsEoqXBCHB+iyjnv45kzE9GzM04RvF1LTMx4raCTuZxlMRRek19xWVZVFt8lRyJi8gu u38ZUEIpJ+zrij3k9/xbFj7vrbgayRXzDXPedB/eQIBbpj2aQhWCjwT0U4WULoKscFmE HPI8iaVpqZh1J3xDDAK7WCu7fPmWgs4nJPlHvSQpAvSUVTG761VnIch2kLVLKnV2QyYG lrXNrW1k/sW819uWAFzzHiWhEWstCcMIELHXNekIbgkN8IjQa6/ZYXVHA21PRMdcSmb6 1Gqg== X-Gm-Message-State: AKGB3mJMH8CoWu188gikoRjzfUADmiRCrwb0/oyUIZRrAPkA7d0TXNaf CpmJsVYhBuuounpaQc03eZ3R/HkVQKU5+6pTGlFV3tJkwRo= X-Google-Smtp-Source: ACJfBovyBz7Hw89XCNi4Px3uTd+Y2bpk0bUPC4k+yFQSiHL2YhDuoq5cerdGbkaOT1gXDC2kNXumyVWI98cfBYy2U/c= X-Received: by 10.55.54.209 with SMTP id d200mr25600480qka.35.1513590472189; Mon, 18 Dec 2017 01:47:52 -0800 (PST) MIME-Version: 1.0 Received: by 10.140.83.168 with HTTP; Mon, 18 Dec 2017 01:47:51 -0800 (PST) In-Reply-To: References: From: =?UTF-8?Q?Marc=2DAur=C3=A8le_Brothier?= Date: Mon, 18 Dec 2017 10:47:51 +0100 Message-ID: Subject: Re: [Discuss] Management cluster / Zookeeper holding locks To: dev@cloudstack.apache.org Cc: "users@cloudstack.apache.org" Content-Type: multipart/alternative; boundary="001a114710a46e3a4005609a3ea7" archived-at: Mon, 18 Dec 2017 09:48:04 -0000 --001a114710a46e3a4005609a3ea7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable We added ZK lock for fix this issue but we will remove all current locks in ZK in favor of ZK one. The ZK lock is already encapsulated in a project with an interface, but more work should be done to have a proper interface for locks which could be implemented with the "tool" you want, either a DB lock for simplicity, or ZK for more advanced scenarios. @Daan you will need to add the ZK libraries in CS and have a running ZK server somewhere. The configuration value is read from the server.properties. If the line is empty, the ZK client is not created and any lock request will immediately return (not holding any lock). @Rafael: ZK is pretty easy to setup and have running, as long as you don't put too much data in it. Regarding our scenario here, with only locks, it's easy. ZK would be only the gatekeeper to locks in the code, ensuring that multi JVM can request a true lock. For the code point of view, you're opening a connection to a ZK node (any of a cluster) and you create a new InterProcessSemaphoreMutex which handles the locking mechanism. On Mon, Dec 18, 2017 at 10:24 AM, Ivan Kudryavtsev wrote: > Rafael, > > - It's easy to configure and run ZK either in single node or cluster > - zookeeper should replace mysql locking mechanism used inside ACS code > (places where ACS locks tables or rows). > > I don't think from the other size, that moving from MySQL locks to ZK loc= ks > is easy and light and (even implemetable) way. > > 2017-12-18 16:20 GMT+07:00 Rafael Weing=C3=A4rtner >: > > > How hard is it to configure Zookeeper and get everything up and running= ? > > BTW: what zookeeper would be managing? CloudStack management servers or > > MySQL nodes? > > > > On Mon, Dec 18, 2017 at 7:13 AM, Ivan Kudryavtsev < > > kudryavtsev_ia@bw-sw.com> > > wrote: > > > > > Hello, Marc-Aurele, I strongly believe that all mysql locks should be > > > removed in favour of truly DLM solution like Zookeeper. The performan= ce > > of > > > 3node ZK ensemble should be enough to hold up to 1000-2000 locks per > > second > > > and it helps to move to truly clustered MySQL like galera without > single > > > master server. > > > > > > 2017-12-18 15:33 GMT+07:00 Marc-Aur=C3=A8le Brothier : > > > > > > > Hi everyone, > > > > > > > > I was wondering how many of you are running CloudStack with a clust= er > > of > > > > management servers. I would think most of you, but it would be nice > to > > > hear > > > > everyone voices. And do you get hosts going over their capacity > limits? > > > > > > > > We discovered that during the VM allocation, if you get a lot of > > parallel > > > > requests to create new VMs, most notably with large profiles, the > > > capacity > > > > increase is done too far after the host capacity checks and results > in > > > > hosts going over their capacity limits. To detail the steps: the > > > deployment > > > > planner checks for cluster/host capacity and pick up one deployment > > plan > > > > (zone, cluster, host). The plan is stored in the database under a > > VMwork > > > > job and another thread picks that entry and starts the deployment, > > > > increasing the host capacity and sending the commands. Here there's= a > > > time > > > > gap between the host being picked up and the capacity increase for > that > > > > host of a couple of seconds, which is well enough to go over the > > capacity > > > > on one or more hosts. A few VMwork job can be added in the DB queue > > > > targeting the same host before one gets picked up. > > > > > > > > To fix this issue, we're using Zookeeper to act as the multi JVM lo= ck > > > > manager thanks to their curator library ( > > > > https://curator.apache.org/curator-recipes/shared-lock.html). We > also > > > > changed the time when the capacity is increased, which occurs now > > pretty > > > > much after the deployment plan is found and inside the zookeeper > lock. > > > This > > > > ensure we don't go over the capacity of any host, and it has been > > proven > > > > efficient since a month in our management server cluster. > > > > > > > > This adds another potential requirement which should be discuss > before > > > > proposing a PR. Today the code works seamlessly without ZK too, to > > ensure > > > > it's not a hard requirement, for example in a lab. > > > > > > > > Comments? > > > > > > > > Kind regards, > > > > Marc-Aur=C3=A8le > > > > > > > > > > > > > > > > -- > > > With best regards, Ivan Kudryavtsev > > > Bitworks Software, Ltd. > > > Cell: +7-923-414-1515 > > > WWW: http://bitworks.software/ > > > > > > > > > > > -- > > Rafael Weing=C3=A4rtner > > > > > > -- > With best regards, Ivan Kudryavtsev > Bitworks Software, Ltd. > Cell: +7-923-414-1515 > WWW: http://bitworks.software/ > --001a114710a46e3a4005609a3ea7--