zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Han <h...@apache.org>
Subject Re: KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum
Date Fri, 02 Aug 2019 22:51:45 GMT
Hi Pat,

Yes, ZOOKEEPER-153 could help this case. The gist of the issue is reliable
change notification with data. The linearizable read I had in mind alone
might not solve this as it's missing the reliably capturing change
notification part.

>> I'll also add that we haven't done any benchmarking in quite some time.

I think this is a very good point. The existing public benchmarks are
either targeted old version, or not optimally set up. This creates a gap
between current scalability and performance of ZK and the existing (usually
negative) public perception. With many improvements on scale / perf in last
2 years the status quo is very different now.

On Fri, Aug 2, 2019 at 11:49 AM Patrick Hunt <phunt@apache.org> wrote:

> Michael I think you are describing subscribe - this?
> https://issues.apache.org/jira/browse/ZOOKEEPER-153
> wasn't there some work done to keep tlogs around for a while? Or am I miss
> remembering? (fb folks?)
>
> I'll also add that we haven't done any benchmarking in quite some time. It
> would be interesting to collect a few of these use cases from the
> community, esp downstreams, and evaluate performance, see if we can
> address.
>
> Patrick
>
> On Fri, Aug 2, 2019 at 11:03 AM Michael Han <hanm@apache.org> wrote:
>
> > Folks,
> >
> > Some of you might already see this. Comments?
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
> >
> >
> > What caught my eyes are:
> >
> > *Worse still, although ZooKeeper is the store of record, the state in
> > ZooKeeper often doesn't match the state that is held in memory in the
> > controller.  For example, when a partition leader changes its ISR in ZK,
> > the controller will typically not learn about these changes for many
> > seconds.  There is no generic way for the controller to follow the
> > ZooKeeper event log.  Although the controller can set one-shot watches,
> the
> > number of watches is limited for performance reasons.  When a watch
> > triggers, it doesn't tell the controller the current state-- only that
> the
> > state has changed.  By the time the controller re-reads the znode and
> sets
> > up a new watch, the state may have changed from what it was when the
> watch
> > originally fired.  If there is no watch set, the controller may not learn
> > about the change at all.  In some cases, restarting the controller is the
> > only way to resolve the discrepancy.*
> >
> > I've seen some similar zookeeper use cases that ended up like what's
> > described here. How can ZooKeeper solve this? It seems to me that the
> only
> > solution is to provide linearizable read on watched operations. Thoughts?
> >
> > Michael.
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message