zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum
Date Fri, 02 Aug 2019 19:44:10 GMT
The core issue in these situations in my experience is that having the
quorum as a separate service can be a pain point. This misunderstanding
about how watches work and why they don't provide the data is just a
symptom of this. Having an integrated quorum is very attractive from the
point of view of management and tighter integration with the record of

If I had it all to do over again, though, I think I would still opt for
quorum outside rather than quorum as a library. There are management
burdens, but many of those management burdens are implicit in the fact that
managing the state of the system is different from managing the system or
doing the stuff the system does. Pulling the quorum system into the
do-stuff system doesn't actually make life all that much easier even if it
does simplify the installer.

The countervailing risk that you are likely to get a quorum system wrong is
really significant. Having a battle-tested (some might say battle-scarred)
system like ZK is quite a virtue since you can have a different level of
confidence in it than something you whipped up last week.

On Fri, Aug 2, 2019 at 11:49 AM Patrick Hunt <phunt@apache.org> wrote:

> Michael I think you are describing subscribe - this?
> https://issues.apache.org/jira/browse/ZOOKEEPER-153
> wasn't there some work done to keep tlogs around for a while? Or am I miss
> remembering? (fb folks?)
> I'll also add that we haven't done any benchmarking in quite some time. It
> would be interesting to collect a few of these use cases from the
> community, esp downstreams, and evaluate performance, see if we can
> address.
> Patrick
> On Fri, Aug 2, 2019 at 11:03 AM Michael Han <hanm@apache.org> wrote:
> > Folks,
> >
> > Some of you might already see this. Comments?
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
> >
> >
> > What caught my eyes are:
> >
> > *Worse still, although ZooKeeper is the store of record, the state in
> > ZooKeeper often doesn't match the state that is held in memory in the
> > controller.  For example, when a partition leader changes its ISR in ZK,
> > the controller will typically not learn about these changes for many
> > seconds.  There is no generic way for the controller to follow the
> > ZooKeeper event log.  Although the controller can set one-shot watches,
> the
> > number of watches is limited for performance reasons.  When a watch
> > triggers, it doesn't tell the controller the current state-- only that
> the
> > state has changed.  By the time the controller re-reads the znode and
> sets
> > up a new watch, the state may have changed from what it was when the
> watch
> > originally fired.  If there is no watch set, the controller may not learn
> > about the change at all.  In some cases, restarting the controller is the
> > only way to resolve the discrepancy.*
> >
> > I've seen some similar zookeeper use cases that ended up like what's
> > described here. How can ZooKeeper solve this? It seems to me that the
> only
> > solution is to provide linearizable read on watched operations. Thoughts?
> >
> > Michael.
> >

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message