zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maciej SmoleĊ„ski <jezd...@gmail.com>
Subject Re: Leader election
Date Fri, 07 Dec 2018 10:11:14 GMT
On Fri, Dec 7, 2018 at 3:03 AM Michael Borokhovich <michaelbor@gmail.com>

> We are planning to run Zookeeper nodes embedded with the client nodes.
> I.e., each client runs also a ZK node. So, network partition will
> disconnect a ZK node and not only the client.
> My concern is about the following statement from the ZK documentation:
> "Timeliness: The clients view of the system is guaranteed to be up-to-date
> within a certain time bound. (*On the order of tens of seconds.*) Either
> system changes will be seen by a client within this bound, or the client
> will detect a service outage."

This is related to the fact that ZooKeeper server handles reads from its
local state - without communicating with other ZooKeeper servers.
This design ensures scalability for read dominated workloads.
In this approach client might receive data which is not up to date (it
might not contain updates from other ZooKeeper servers (quorum)).
Parameter 'syncLimit' describes how often ZooKeeper server
synchronizes/updates its local state to global state.
Client read operation will retrieve data from state not older then
described by 'syncLimit'.

However ZooKeeper client can always force to retrieve data which is up to
It needs to issue 'sync' command to ZooKeeper server before issueing 'read'.
With 'sync' ZooKeeper server with synchronize its local state with global
Later 'read' will be handled from updated state.
Client should be careful here - so that it communicates with the same
ZooKeeper server for both 'sync' and 'read'.

> What are these "*tens of seconds*"? Can we reduce this time by configuring
> "syncLimit" and "tickTime" to let's say 5 seconds? Can we have a strong
> guarantee on this time bound?

As describe above - you might use 'sync'+'read' to avoid this problem.

> On Thu, Dec 6, 2018 at 1:05 PM Jordan Zimmerman <
> jordan@jordanzimmerman.com>
> wrote:
> > > Old service leader will detect network partition max 15 seconds after
> it
> > > happened.
> >
> > If the old service leader is in a very long GC it will not detect the
> > partition. In the face of VM pauses, etc. it's not possible to avoid 2
> > leaders for a short period of time.
> >
> > -JZ

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message