cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Daan Hoogland <daan.hoogl...@gmail.com>
Subject Re: [Discuss] Management cluster / Zookeeper holding locks
Date Mon, 18 Dec 2017 09:12:59 GMT
Are you proposing to add zookeeper as an optional requirement, Marc-Aurèle?
or just curator? and what is the decision mech of including it or not?

On Mon, Dec 18, 2017 at 9:33 AM, Marc-Aurèle Brothier <marco@exoscale.ch>
wrote:

> Hi everyone,
>
> I was wondering how many of you are running CloudStack with a cluster of
> management servers. I would think most of you, but it would be nice to hear
> everyone voices. And do you get hosts going over their capacity limits?
>
> We discovered that during the VM allocation, if you get a lot of parallel
> requests to create new VMs, most notably with large profiles, the capacity
> increase is done too far after the host capacity checks and results in
> hosts going over their capacity limits. To detail the steps: the deployment
> planner checks for cluster/host capacity and pick up one deployment plan
> (zone, cluster, host). The plan is stored in the database under a VMwork
> job and another thread picks that entry and starts the deployment,
> increasing the host capacity and sending the commands. Here there's a time
> gap between the host being picked up and the capacity increase for that
> host of a couple of seconds, which is well enough to go over the capacity
> on one or more hosts. A few VMwork job can be added in the DB queue
> targeting the same host before one gets picked up.
>
> To fix this issue, we're using Zookeeper to act as the multi JVM lock
> manager thanks to their curator library (
> https://curator.apache.org/curator-recipes/shared-lock.html). We also
> changed the time when the capacity is increased, which occurs now pretty
> much after the deployment plan is found and inside the zookeeper lock. This
> ensure we don't go over the capacity of any host, and it has been proven
> efficient since a month in our management server cluster.
>
> This adds another potential requirement which should be discuss before
> proposing a PR. Today the code works seamlessly without ZK too, to ensure
> it's not a hard requirement, for example in a lab.
>
> Comments?
>
> Kind regards,
> Marc-Aurèle
>



-- 
Daan

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message