zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: Zookeeper syncing with Curator
Date Mon, 18 Mar 2019 21:21:55 GMT
> I have to make sure that a read always reflects *all previous writes*
(which might be performed on another
zookeeper server and has not reached all other instances).

By doing a sync before reading, as you say, the read should indeed reflect
all *completed* previous writes, i.e., writes that were acknowledged to the
client issuing them,
even if some of the ZK replicas didn't receive them yet.

There is a caveat here, which is that the current implementation of sync
doesn't involve a quorum, and therefore its correctness is dependent on
certain timing assumptions.
Under some (hopefully very rare) leader replacement scenarios, sync might
not reflect the latest data in the system. There is a JIRA to fix this:
https://issues.apache.org/jira/browse/ZOOKEEPER-2136

I believe that if you issue a read after a sync, your read will be queued
at the local ZK Server until the sync completes, and only executed at that
time, you don't need to wait for sync completion before enqueuing the read.
The sync does not explicitly transfer data, its just a way to "flush" all
previous updates from the leader to your local server. So when the server
hears back a sync response, it knows that it also has all previous updates.

If I recall correctly, currently there is no effect on the path you specify
in sync, so it just brings all the data of your local server up-to-date. I
doubt that will change but I mat be wrong. Syncing "/" is probably safest
even if something changes in the future.

> ZooKeeper is an eventually consistent system.

I have to disagree with Jordan a bit here. ZooKeeper is a strongly
consistent system, it is implemented using a variant of Paxos. From the
perspective of an individual replica, sure, the data is propagated
eventually. But strong / weak consistency of a system is usually determined
by considering the semantics of the API it provides to clients. If you just
do reads+writes you get "sequential consistency". If you do sync+read,
you'll get linearizability (if the JIRA above is fixed).  RDBMS provides
different abstractions (transactions, queries, secondary indices). ZK only
deals with individual operations and batches not interactive transactions.
But for these you do get strong semantics in ZK.

> In a dynamic ensemble with lots of concurrent reads/writes there is no
such thing a read reflecting all active writes.

I think that the key here, is that ZK allows you to do strong reads
(sync+read), which will reflect all *completed* writes. Not active writes
(not sure how these would be defined). Dynamic reconfiguration was designed
not to change the properties of a static ZK ensemble.


Alex


On Fri, Mar 15, 2019 at 3:55 PM Jordan Zimmerman <jordan@jordanzimmerman.com>
wrote:

> Curator does nothing additional with sync. Sync is a feature of ZooKeeper
> not Curator. Curator merely exposes an API for it.
>
> -JZ
>
> > On Mar 14, 2019, at 9:35 AM, Robin Wolters <rbn.wolters@googlemail.com.INVALID>
> wrote:
> >
> > That is indeed an option, thanks.
> >
> > But for my own curiosity, how does the sync operation behave for Curator?
> > 1) Does it also sync the child nodes of the specified path?
> > 2) Does it sync (transfer data for) a node even if it was up to date?
> > 3) In Curator, would I have to wait for the callback of sync or can I
> > just use sync and go ahead, knowing the next operation is queued?
> >
> > Regards,
> > Robin
> >
> > On Wed, 13 Mar 2019 at 17:07, Jordan Zimmerman
> > <jordan@jordanzimmerman.com> wrote:
> >>
> >> It sounds like you’re describing one of the Barrier recipes. Curator
> has several. I’d look to those as a possible solution.
> >>
> >> ====================
> >> Jordan Zimmerman
> >>
> >>> On Mar 13, 2019, at 9:56 AM, Robin Wolters <rbn.wolters@googlemail.com.invalid>
> wrote:
> >>>
> >>> Thanks for the reply. I understand that this is not possible in
> general.
> >>>
> >>> In my case the read and write are started from the same overarching
> >>> application (but different zookeeper connections and hence possibly
> >>> different nodes).
> >>> I start the read only after I know the write has succeeded, but I
> >>> don't know if it has reached all nodes yet.
> >>> So I expected that a sync gives me the guarantee that the next read
> >>> reflects at least this specific write.
> >>> It's okay if possible further writes are not in yet.
> >>>
> >>> Is this "selective" consistency not possible with my approach?
> >>>
> >>> Best regards,
> >>> Robin
> >>>
> >>> On Wed, 13 Mar 2019 at 15:47, Jordan Zimmerman
> >>> <jordan@jordanzimmerman.com> wrote:
> >>>>
> >>>> ZooKeeper is an eventually consistent system. Reads are always
> consistent in that they reflect previous writes, however it is not possible
> to do what you describe. Reads are fulfilled by the Node your client is
> connected to. Writes are always through the leader Node. In a dynamic
> ensemble with lots of concurrent reads/writes there is no such thing a read
> reflecting all active writes.
> >>>>
> >>>> You should consider a RDBMS like MySQL instead of something like
> ZooKeeper.
> >>>>
> >>>> ====================
> >>>> Jordan Zimmerman
> >>>>
> >>>>> On Mar 13, 2019, at 6:37 AM, Robin Wolters <
> rbn.wolters@googlemail.com.invalid> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> I use Zookeeper in a cluster setup and some of my read operations
> need
> >>>>> to be consistent, meaning I have to make sure that a read always
> >>>>> reflects all previous writes (which might be performed on another
> >>>>> zookeeper server and has not reached all other instances).
> >>>>> The idea is to force a sync before those reads to make them
> >>>>> “consistent” reads with:
> >>>>> client.sync().forPath(path)
> >>>>>
> >>>>> For this, I have these questions left:
> >>>>> 1. Do you need to manually await the callback of sync before reading,
> >>>>> or is the next read operation queued until the sync is complete?
> >>>>> 2. Which amount of data is transferred between the nodes in this
kind
> >>>>> of manual sync?
> >>>>> a) Does it always transfer and process data from the master server
> >>>>> even if the syncing node is up-to-date on this path - or only for
> >>>>> those nodes that are really out of sync (i.e. sync only possible
> >>>>> deltas)?
> >>>>> b) Does a sync on the path also force the parent nodes to sync?
> >>>>> c) Does a sync on the path also force all child nodes to sync?
> >>>>> d) How would one manually sync the complete data (as the regular
> >>>>> sync does) of a node? Is client.sync().forPath("/") the way to do
> >>>>> this?
> >>>>>
> >>>>> Anyone experiences with this?
> >>>>>
> >>>>> Best regards,
> >>>>> Robin
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message