zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Camille Fournier <cami...@apache.org>
Subject Re: Distribution Problems With Multiple Zookeeper Clients
Date Thu, 17 May 2012 18:21:55 GMT
The below is written assuming that all clients are seeing all events, but
then they race to get a lock of some sort to do the work, and the same 10
are always getting the lock to do the work. If in fact not all of your
clients are even getting all the events, that's another problem.

So here's what I think happens, although other devs that know this code
better may prove me wrong. When a client connects to a server and creates a
watch for a particular path, that member of the ZK quorum adds the watch
for that path to a WatchManager. The WatchManager underlying has a HashSet
that contains the watches for that path. When an event happens on that
path, the server will iteratate through the watchers on that path and send
them the watch notification.
It's quite possible that if your events are infrequent and/or your client
servers aren't that loaded, what will happen is that the first few clients
that registered that watch on each quorum member are likely to receive and
process the watch first because their notifications were sent first, and
will also always reset the watch for that path first if your code is
written to reset the watch immediately upon receiving the notification.
They always win the race, and thus always do all the work.
In general, the indication is that you have more clients that you need
available to do the work you want to do. If in fact you don't, perhaps the
right thing to do is to investigate how you are handing off work and
responding to watch notifications within your client. IE, if you have a
client that is already doing some work and it gets a watch notification, it
may not want to race for the lock. You may want to schedule trying to get
the lock and then process more work in a limited thread pool, so that you
know there's a limit of N tasks that can be being performed by each client
and thus scope the max load on each server.

Does this make sense?


On Wed, May 16, 2012 at 3:46 PM, Narasimha Tadepalli <
Narasimha.Tadepalli@pervasive.com> wrote:

> Hi Camille
> Sorry for the confusion. Yes it is watches. We have multiple clients
> configured to watch on event change at server end. For example we have a
> data directory of /data/345/text. All 30 clients keep watching for event
> change under /data/345 directory if there is any change clients need to
> process and read the child nodes. In this situation not all clients not
> getting equal events. I am looking for a way to distribute the load equally
> to all client instances. I hope I provided enough clarification now or else
> let me know.
> Thanks
> Narasimha
> -----Original Message-----
> From: cf@renttherunway.com [mailto:cf@renttherunway.com] On Behalf Of
> Camille Fournier
> Sent: Tuesday, May 15, 2012 1:20 PM
> To: user@zookeeper.apache.org
> Subject: Re: Distribution Problems With Multiple Zookeeper Clients
> I'm not sure what you mean by messages. Are you talking about watches? Can
> you describe your clients in more detail?
> Thanks,
> Camille
> On Tue, May 15, 2012 at 1:29 PM, Narasimha Tadepalli <
> Narasimha.Tadepalli@pervasive.com> wrote:
> > Dear All
> >
> > We have a situation where messages are not distributed equally when we
> > have multiple clients listening to one zookeeper cluster. Say we have
> > 30 client instances listing to one cluster and when 1000 messages
> > submitted in
> > 30 mins to cluster I assume each client approximately supposed to
> > receive
> > 33 messages. But out of 30 only 10 client instances taking max load
> > and rest of them getting very low volume of messages. Is it something
> > can be configurable in zookeeper settings or need to implement some
> > custom solution at our end to distribute equally? Before I reinvent
> > the wheel looking around for some suggestions if any of you faced
> similar situation.
> >
> > Thanks
> > Narasimha
> >
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message