cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Kudryavtsev <>
Subject Re: [Discuss] Management cluster / Zookeeper holding locks
Date Mon, 18 Dec 2017 09:13:54 GMT
Hello, Marc-Aurele, I strongly believe that all mysql locks should be
removed in favour of truly DLM solution like Zookeeper. The performance of
3node ZK ensemble should be enough to hold up to 1000-2000 locks per second
and it helps to move to truly clustered MySQL like galera without single
master server.

2017-12-18 15:33 GMT+07:00 Marc-Aurèle Brothier <>:

> Hi everyone,
> I was wondering how many of you are running CloudStack with a cluster of
> management servers. I would think most of you, but it would be nice to hear
> everyone voices. And do you get hosts going over their capacity limits?
> We discovered that during the VM allocation, if you get a lot of parallel
> requests to create new VMs, most notably with large profiles, the capacity
> increase is done too far after the host capacity checks and results in
> hosts going over their capacity limits. To detail the steps: the deployment
> planner checks for cluster/host capacity and pick up one deployment plan
> (zone, cluster, host). The plan is stored in the database under a VMwork
> job and another thread picks that entry and starts the deployment,
> increasing the host capacity and sending the commands. Here there's a time
> gap between the host being picked up and the capacity increase for that
> host of a couple of seconds, which is well enough to go over the capacity
> on one or more hosts. A few VMwork job can be added in the DB queue
> targeting the same host before one gets picked up.
> To fix this issue, we're using Zookeeper to act as the multi JVM lock
> manager thanks to their curator library (
> We also
> changed the time when the capacity is increased, which occurs now pretty
> much after the deployment plan is found and inside the zookeeper lock. This
> ensure we don't go over the capacity of any host, and it has been proven
> efficient since a month in our management server cluster.
> This adds another potential requirement which should be discuss before
> proposing a PR. Today the code works seamlessly without ZK too, to ensure
> it's not a hard requirement, for example in a lab.
> Comments?
> Kind regards,
> Marc-Aurèle

With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: <>

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message