zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Flavio Junqueira <...@apache.org>
Subject Criticism on ZK
Date Tue, 13 Feb 2018 11:02:27 GMT
Hello community,

I came across this blog post:


And I thought it would be a good idea to discuss the criticism as a community. Let me copy
the points here and add some notes:

	• Unlike Kafka it does not have a vibrant and huge community (merge those PR’s please,
I have personally met and worked with a lot of great people in this community over the years,
and as such, I probably have a pretty biased view. But, it is a common concern that we are
not fast enough at responding. We also don't have conferences and large meetups compared to
other communities. Are those really necessary, though? What can we do to be a better community?

	• It uses a protocol which is hard to understand and it’s hard to maintain a large Zookeeper
I can't really speak for the hard to understand part, and I don't understand what "maintain
a large ZooKeeper cluster" is referring to. How large is it and why do we need it to be large?
We have features like observers that enable large clusters, but whether it solves the problem
depends on what they are after.

	• It’s a bit outdated, compared say with Raft
When we wrote about Zab years back, we had as a goal to explain the protocol in a way that
could be reproduced. We had other goals too, like explaining how we had been successful in
implementing a system like ZooKeeper with that protocol, the properties it guaranteed and
so on. Raft focused on the simplicity of understanding, which makes a lot of sense given that
there was interest in reproducing it. Given its focus, and clearly the quality of the people
behind it, Raft has been more successful in popularizing the implementation of replicated
state machines. At a protocol level, however, I don't think there is anything that makes Zab
outdated with respect to Raft.

	• It’s written in Java (yes, it’s opinionated but this is a problem for us as ZK is
an infrastructure component)
This is arguable, there are pros and cons both ways.

	• We run everything in Kubernetes and k8s by default has an in-built Raft implementation,
I can totally understand this point. No one wants to have to operate two systems doing similar
things. To consolidate operations, it clearly makes sense to pick one. Ironically, this post
talks about plugability, but Kubernetes does not really give the option of using zk rather
than etcd if that's what I want to use.  

	• Linearizability (if there is a word like this) - check this comparison chart
We do provide linearizable reads with sync(), although I understand that it is arguable whether
that is truly linearizable. There has been a long running discussion about whether we should
make sync() truly linearizable by making it a first-class txn. Back in the day, we haven't
done it because we wanted reads to be fast, so we implemented it in a way that it didn't have
to go through the whole pipeline of request processors, but it still reaches out to the leader.
See the issue for more detail: https://issues.apache.org/jira/browse/ZOOKEEPER-2136

	• Performance and inherent scalability issues
I don't know if those experiments were done using a dedicated device to the txn log, which
is a well-known fact about zk's performance. Incremental snapshotting is clearly a good way
to reduce the amount of disk load for snapshots, but I wonder whether that's really a primary
concern given that servers these days often have multiple devices.

I don't understand that max CPU utilization for zk (https://coreos.com/blog/performance-of-etcd.html).
Perhaps this is something to be investigated.

	• Client side complexity and thick clients
Due to the set of features we wanted to offer, we have indeed chosen this path. 

	• Lack of service discovery
I don't have a good sense of how many users are actually bothered by this. I have heard complaints
over time about service discovery with ZooKeeper, but I'm not sure there was any conclusion
about whether service discovery is a good use case for such coordination systems, including
etcd for that matter.

Any feedback?

View raw message