zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Hunt <ph...@apache.org>
Subject Re: KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum
Date Fri, 02 Aug 2019 18:48:50 GMT
Michael I think you are describing subscribe - this?
https://issues.apache.org/jira/browse/ZOOKEEPER-153
wasn't there some work done to keep tlogs around for a while? Or am I miss
remembering? (fb folks?)

I'll also add that we haven't done any benchmarking in quite some time. It
would be interesting to collect a few of these use cases from the
community, esp downstreams, and evaluate performance, see if we can address.

Patrick

On Fri, Aug 2, 2019 at 11:03 AM Michael Han <hanm@apache.org> wrote:

> Folks,
>
> Some of you might already see this. Comments?
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum
>
>
> What caught my eyes are:
>
> *Worse still, although ZooKeeper is the store of record, the state in
> ZooKeeper often doesn't match the state that is held in memory in the
> controller.  For example, when a partition leader changes its ISR in ZK,
> the controller will typically not learn about these changes for many
> seconds.  There is no generic way for the controller to follow the
> ZooKeeper event log.  Although the controller can set one-shot watches, the
> number of watches is limited for performance reasons.  When a watch
> triggers, it doesn't tell the controller the current state-- only that the
> state has changed.  By the time the controller re-reads the znode and sets
> up a new watch, the state may have changed from what it was when the watch
> originally fired.  If there is no watch set, the controller may not learn
> about the change at all.  In some cases, restarting the controller is the
> only way to resolve the discrepancy.*
>
> I've seen some similar zookeeper use cases that ended up like what's
> described here. How can ZooKeeper solve this? It seems to me that the only
> solution is to provide linearizable read on watched operations. Thoughts?
>
> Michael.
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message