zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Dunning <ted.dunn...@gmail.com>
Subject Re: KIP-500: Replace ZooKeeper with a Self-Managed Metadata Quorum
Date Tue, 13 Aug 2019 19:00:36 GMT
On Tue, Aug 13, 2019 at 10:41 AM Andor Molnar <andor@apache.org> wrote:

> Hmmm... and what should Kafka do if it wants to see all events? Use
> RabbitMQ? :)

Kafka should not be using ZK to store messages.  And synchronization of
configuration doesn't require message passing.

Read the original ZK paper [1] to understand that coordination and reliable
state synchronization in ZK is not message passing. The point is about
providing primitives to allow a read cache of roughly coherent values with
good bounds on the semantics. Kafka should use ZK to coordinate who is
master and should use its own mechanisms to pass messages. A key point is
the performance that can be gained by using a cache coherence strategy
rather than an "all updates" strategy.

Look at the Omega scheduler[2] paper to see the motivation behind the
small-steps style of scheduling and resource allocation.

Seriously, it you are doing things like designating where replicas go and
who has the baton, it is critical to fast forward to things as they are now
rather than things as they were. When the load hits the fan, it is
important to be able to ignore the water under the bridge.

This sort of topic arises in lots of other places. People complain that it
is hard to get synchronized and atomic updates to multiple files in a
distributed file system. Moral: don't use a file system when a
synchronization tool like ZK is needed.

People complain about message passing systems that they are slow in terms
of total write speed (can't write 100's of GB/s) and don't allow native and
efficient reading of state from a large data structure. Moral: don't use a
message store when you really want a file store.

People complain the ZK's watches only give an update to latest state so
they have to put messages into independent znodes and performance and scale
goes to crap. Moral: don't use a synchronization system to store messages.

Getting architectural basics close to write goes a long way.

[1] https://www.usenix.org/legacy/event/usenix10/tech/full_papers/Hunt.pdf
[2] https://ai.google/research/pubs/pub41684

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message