cloudstack-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Kudryavtsev <kudryavtsev...@bw-sw.com>
Subject Re: [Discuss] Management cluster / Zookeeper holding locks
Date Mon, 18 Dec 2017 09:24:55 GMT
Rafael,

- It's easy to configure and run ZK either in single node or cluster
- zookeeper should replace mysql locking mechanism used inside ACS code
(places where ACS locks tables or rows).

I don't think from the other size, that moving from MySQL locks to ZK locks
is easy and light and (even implemetable) way.

2017-12-18 16:20 GMT+07:00 Rafael Weingärtner <rafaelweingartner@gmail.com>:

> How hard is it to configure Zookeeper and get everything up and running?
> BTW: what zookeeper would be managing? CloudStack management servers or
> MySQL nodes?
>
> On Mon, Dec 18, 2017 at 7:13 AM, Ivan Kudryavtsev <
> kudryavtsev_ia@bw-sw.com>
> wrote:
>
> > Hello, Marc-Aurele, I strongly believe that all mysql locks should be
> > removed in favour of truly DLM solution like Zookeeper. The performance
> of
> > 3node ZK ensemble should be enough to hold up to 1000-2000 locks per
> second
> > and it helps to move to truly clustered MySQL like galera without single
> > master server.
> >
> > 2017-12-18 15:33 GMT+07:00 Marc-Aurèle Brothier <marco@exoscale.ch>:
> >
> > > Hi everyone,
> > >
> > > I was wondering how many of you are running CloudStack with a cluster
> of
> > > management servers. I would think most of you, but it would be nice to
> > hear
> > > everyone voices. And do you get hosts going over their capacity limits?
> > >
> > > We discovered that during the VM allocation, if you get a lot of
> parallel
> > > requests to create new VMs, most notably with large profiles, the
> > capacity
> > > increase is done too far after the host capacity checks and results in
> > > hosts going over their capacity limits. To detail the steps: the
> > deployment
> > > planner checks for cluster/host capacity and pick up one deployment
> plan
> > > (zone, cluster, host). The plan is stored in the database under a
> VMwork
> > > job and another thread picks that entry and starts the deployment,
> > > increasing the host capacity and sending the commands. Here there's a
> > time
> > > gap between the host being picked up and the capacity increase for that
> > > host of a couple of seconds, which is well enough to go over the
> capacity
> > > on one or more hosts. A few VMwork job can be added in the DB queue
> > > targeting the same host before one gets picked up.
> > >
> > > To fix this issue, we're using Zookeeper to act as the multi JVM lock
> > > manager thanks to their curator library (
> > > https://curator.apache.org/curator-recipes/shared-lock.html). We also
> > > changed the time when the capacity is increased, which occurs now
> pretty
> > > much after the deployment plan is found and inside the zookeeper lock.
> > This
> > > ensure we don't go over the capacity of any host, and it has been
> proven
> > > efficient since a month in our management server cluster.
> > >
> > > This adds another potential requirement which should be discuss before
> > > proposing a PR. Today the code works seamlessly without ZK too, to
> ensure
> > > it's not a hard requirement, for example in a lab.
> > >
> > > Comments?
> > >
> > > Kind regards,
> > > Marc-Aurèle
> > >
> >
> >
> >
> > --
> > With best regards, Ivan Kudryavtsev
> > Bitworks Software, Ltd.
> > Cell: +7-923-414-1515
> > WWW: http://bitworks.software/ <http://bw-sw.com/>
> >
>
>
>
> --
> Rafael Weingärtner
>



-- 
With best regards, Ivan Kudryavtsev
Bitworks Software, Ltd.
Cell: +7-923-414-1515
WWW: http://bitworks.software/ <http://bw-sw.com/>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message