From dev-return-67467-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org  Tue Feb 13 17:10:48 2018
Return-Path: <dev-return-67467-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org>
X-Original-To: archive-asf-public@cust-asf.ponee.io
Delivered-To: archive-asf-public@cust-asf.ponee.io
Received: from mail.apache.org (hermes.apache.org [140.211.11.3])
	by mx-eu-01.ponee.io (Postfix) with SMTP id F2B3F180656
	for <archive-asf-public@cust-asf.ponee.io>; Tue, 13 Feb 2018 17:10:47 +0100 (CET)
Received: (qmail 93443 invoked by uid 500); 13 Feb 2018 16:10:46 -0000
Mailing-List: contact dev-help@zookeeper.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:dev-help@zookeeper.apache.org>
List-Unsubscribe: <mailto:dev-unsubscribe@zookeeper.apache.org>
List-Post: <mailto:dev@zookeeper.apache.org>
List-Id: <dev.zookeeper.apache.org>
Reply-To: dev@zookeeper.apache.org
Delivered-To: mailing list dev@zookeeper.apache.org
Received: (qmail 93428 invoked by uid 99); 13 Feb 2018 16:10:46 -0000
Received: from mail-relay.apache.org (HELO mailrelay2-lw-us.apache.org) (207.244.88.137)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 13 Feb 2018 16:10:46 +0000
Received: from mail-wm0-f41.google.com (mail-wm0-f41.google.com [74.125.82.41])
	by mailrelay2-lw-us.apache.org (ASF Mail Server at mailrelay2-lw-us.apache.org) with ESMTPSA id 52948C1C
	for <dev@zookeeper.apache.org>; Tue, 13 Feb 2018 16:10:45 +0000 (UTC)
Received: by mail-wm0-f41.google.com with SMTP id j199so4914712wmj.2
        for <dev@zookeeper.apache.org>; Tue, 13 Feb 2018 08:10:44 -0800 (PST)
X-Gm-Message-State: APf1xPChlOPFQ2tjBfY9emkwZpe9hOy2NfAzPP/oFfHtvbo8tyHVwXus
	jjZAeRtENL0tkvCEZbNyaJZxNxt0n7jWEdloBnQ=
X-Google-Smtp-Source: AH8x2273ZDPVihvOywJn9QBSHrMJ3RHRNsEm5Fz5Gie7FHnr+CO2SFOfaL0PGAa8+ck/tmpRXFM2zUDi9Y6ffI/Gls8=
X-Received: by 10.28.180.4 with SMTP id d4mr1895711wmf.13.1518538243509; Tue,
 13 Feb 2018 08:10:43 -0800 (PST)
MIME-Version: 1.0
Received: by 10.28.100.213 with HTTP; Tue, 13 Feb 2018 08:10:02 -0800 (PST)
In-Reply-To: <B6D2F518-AE6B-4F96-83D9-7A94A7A33972@jordanzimmerman.com>
References: <7CAE4606-1F3A-42D9-8659-C2538B776402@apache.org> <B6D2F518-AE6B-4F96-83D9-7A94A7A33972@jordanzimmerman.com>
From: Patrick Hunt <phunt@apache.org>
Date: Tue, 13 Feb 2018 08:10:02 -0800
X-Gmail-Original-Message-ID: <CANLc_9Jvv9KUwPbpfG3zyz-4d5XGN8dquAGmCMOCD9M5zvyQQQ@mail.gmail.com>
Message-ID: <CANLc_9Jvv9KUwPbpfG3zyz-4d5XGN8dquAGmCMOCD9M5zvyQQQ@mail.gmail.com>
Subject: Re: Criticism on ZK
To: DevZooKeeper <dev@zookeeper.apache.org>
Content-Type: multipart/alternative; boundary="001a114b2e7e95162405651a3cad"

--001a114b2e7e95162405651a3cad
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

On Tue, Feb 13, 2018 at 6:47 AM, Jordan Zimmerman <
jordan@jordanzimmerman.com> wrote:

> > =E2=80=A2 Unlike Kafka it does not have a vibrant and huge community (m=
erge
> those PR=E2=80=99s please, anyone?)
>
> This is clearly true. The community was active 5 or so years ago but in
> the past few years it's almost non-existent. Patrick is the only active
> committer. It can take years (!!) and numerous cajoling emails to get
> engagement on pull requests. Releases happen only once or twice a year. T=
he
> worst culprit has been the so-called alpha/beta of 3.5.x. Whatever the
> beliefs of the ZooKeeper team are, 3.5.x has been in production at major
> tech companies for _years_ yet it's still treated as a non-released
> version. Even if we were to accept the alpha/beta label, the original 3.5=
.0
> alpha was 3 and half years ago! That's crazy and has contributed
> dramatically to the negative perception of ZK.
>
> > It uses a protocol which is hard to understand and it=E2=80=99s hard to=
 maintain
> a large Zookeeper cluster
>
> This is a red herring. Raft may be easy to understand from the whitepaper
> but any distributed protocol is difficult in practice. Further, no user o=
f
> a tool such as etcd or ZooKeeper remotely cares about the protocol. That'=
s
> an implementation detail.
>
> > It=E2=80=99s a bit outdated, compared say with Raft
>
> Another red herring. Raft and ZAB are, essentially, the same protocol.
>
> > It=E2=80=99s written in Java (yes, it=E2=80=99s opinionated but this is=
 a problem for us
> as ZK is an infrastructure component)
>
> There is a current bias against Java. The reasons for this are beyond the
> scope of what we can discuss here. But, in my view, it's ludicrous. That
> said, the non-Java clients for ZooKeeper are lacking and this is a proble=
m.
> I don't believe there is a good Go client for ZooKeeper for example.
>
> > We run everything in Kubernetes and k8s by default has an in-built Raft
> implementation, etcd
>
> etcd is a good key/store system. However, I'm not sure how well it does
> for leaders/locks/etc. at scale. Also, is there a good Java/JVM client fo=
r
> it? I know they've been working on one but what is it's status? We are
> working against trends in the DevOps world here. DevOps has moved almost
> entirely to Go and the Hashicorp borg. If it's not in Go they're not real=
ly
> interested. This is not a problem for ZooKeeper as it addresses a differe=
nt
> space - applications. But, the Ops people IMO confuse the two products an=
d
> think "we already have etcd why do we need another system to support." A
> good white paper detailing the real differences between etcd/consul and
> ZooKeeper is needed.
>
> > Linearizability (if there is a word like this) - check this comparison
> chart
>
> This is just wrong. All operations in ZooKeeper are ordered. This, I
> think, comes up when using etcd as a k/v store. These two use cases,
> locks/leaders/register vs k/v store keep coming up. ZooKeeper is not a
> database. etcd _can_ be used as a k/v store.
>
> > Performance and inherent scalability issues
>
> ZK's performance is better than etcd AFAIK for the uses cases it was
> designed for. However, operating ZooKeeper can be a bear. I know that it'=
s
> very difficult to find qualified ops engineers who can manage ZK ensemble=
s
> at high scale. In particular, if ZK is used as a quasi-database it can be
> very difficult to operate (we're having that problem at Elasticsearch
> Cloud).
>
> > Client side complexity and thick clients
>
> Well, as the author of Apache Curator, I don't see why this is a problem.
> What does it matter if the client does a lot of the work or the server.
> It's opaque to application writers. In any event, most of the "recipes" i=
n
> Curator are not in-the-box with consul/etcd. These need to be written and
> then you have a thick client again. Most of the things you want to do wit=
h
> ZooKeeper are already implemented in Curator. However, if you're not on t=
he
> JVM you don't get those.
>
> > Lack of service discovery
>
> Curator has had Service Discovery since its beginning:
> http://curator.apache.org/curator-x-discovery/index.html <
> http://curator.apache.org/curator-x-discovery/index.html>
>
> -Jordan
>
> > On Feb 13, 2018, at 6:02 AM, Flavio Junqueira <fpj@apache.org> wrote:
> >
> > Hello community,
> >
> > I came across this blog post:
> >
> >      https://banzaicloud.com/blog/kafka-on-etcd/
> >
> > And I thought it would be a good idea to discuss the criticism as a
> community. Let me copy the points here and add some notes:
> >
> >       =E2=80=A2 Unlike Kafka it does not have a vibrant and huge commun=
ity
> (merge those PR=E2=80=99s please, anyone?)
> > I have personally met and worked with a lot of great people in this
> community over the years, and as such, I probably have a pretty biased
> view. But, it is a common concern that we are not fast enough at
> responding. We also don't have conferences and large meetups compared to
> other communities. Are those really necessary, though? What can we do to =
be
> a better community?
> >
> >       =E2=80=A2 It uses a protocol which is hard to understand and it=
=E2=80=99s hard to
> maintain a large Zookeeper cluster
> > I can't really speak for the hard to understand part, and I don't
> understand what "maintain a large ZooKeeper cluster" is referring to. How
> large is it and why do we need it to be large? We have features like
> observers that enable large clusters, but whether it solves the problem
> depends on what they are after.
> >
> >       =E2=80=A2 It=E2=80=99s a bit outdated, compared say with Raft
> > When we wrote about Zab years back, we had as a goal to explain the
> protocol in a way that could be reproduced. We had other goals too, like
> explaining how we had been successful in implementing a system like
> ZooKeeper with that protocol, the properties it guaranteed and so on. Raf=
t
> focused on the simplicity of understanding, which makes a lot of sense
> given that there was interest in reproducing it. Given its focus, and
> clearly the quality of the people behind it, Raft has been more successfu=
l
> in popularizing the implementation of replicated state machines. At a
> protocol level, however, I don't think there is anything that makes Zab
> outdated with respect to Raft.
> >
> >       =E2=80=A2 It=E2=80=99s written in Java (yes, it=E2=80=99s opinion=
ated but this is a
> problem for us as ZK is an infrastructure component)
> > This is arguable, there are pros and cons both ways.
> >
> >       =E2=80=A2 We run everything in Kubernetes and k8s by default has =
an
> in-built Raft implementation, etcd
> > I can totally understand this point. No one wants to have to operate tw=
o
> systems doing similar things. To consolidate operations, it clearly makes
> sense to pick one. Ironically, this post talks about plugability, but
> Kubernetes does not really give the option of using zk rather than etcd i=
f
> that's what I want to use.
> >
> >       =E2=80=A2 Linearizability (if there is a word like this) - check =
this
> comparison chart
> > We do provide linearizable reads with sync(), although I understand tha=
t
> it is arguable whether that is truly linearizable. There has been a long
> running discussion about whether we should make sync() truly linearizable
> by making it a first-class txn. Back in the day, we haven't done it becau=
se
> we wanted reads to be fast, so we implemented it in a way that it didn't
> have to go through the whole pipeline of request processors, but it still
> reaches out to the leader. See the issue for more detail:
> https://issues.apache.org/jira/browse/ZOOKEEPER-2136
> >
> >       =E2=80=A2 Performance and inherent scalability issues
> > I don't know if those experiments were done using a dedicated device to
> the txn log, which is a well-known fact about zk's performance. Increment=
al
> snapshotting is clearly a good way to reduce the amount of disk load for
> snapshots, but I wonder whether that's really a primary concern given tha=
t
> servers these days often have multiple devices.
> >
> > I don't understand that max CPU utilization for zk (
> https://coreos.com/blog/performance-of-etcd.html). Perhaps this is
> something to be investigated.
> >
> >       =E2=80=A2 Client side complexity and thick clients
> > Due to the set of features we wanted to offer, we have indeed chosen
> this path.
> >
> >       =E2=80=A2 Lack of service discovery
> > I don't have a good sense of how many users are actually bothered by
> this. I have heard complaints over time about service discovery with
> ZooKeeper, but I'm not sure there was any conclusion about whether servic=
e
> discovery is a good use case for such coordination systems, including etc=
d
> for that matter.
> >
> > Any feedback?
> >
>

I would add:

* we have a huge install base.

Our users value backward compatibility and "it just works". Both of these
are major factors wrt the items you've listed. When you're new and with few
users you can "move fast and break things". ZK has been around for 10+
years now and is providing core capabilities for many systems. Also we
didn't spend a lot of time up front enabling change, e.g. things like
protobufs/netty/... didn't exist (or were very new?) at the time we built
the original system.

* I suspect as a result of these factors companies tend not to pay people
to work on ZK. iiuc these other systems have companies that do.


Patrick


> > Thanks,
> > -Flavio
>
>

--001a114b2e7e95162405651a3cad--