From user-return-11499-archive-asf-public=cust-asf.ponee.io@zookeeper.apache.org Wed Jun 13 13:42:29 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 86573180609 for ; Wed, 13 Jun 2018 13:42:28 +0200 (CEST) Received: (qmail 42780 invoked by uid 500); 13 Jun 2018 11:42:27 -0000 Mailing-List: contact user-help@zookeeper.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: user@zookeeper.apache.org Delivered-To: mailing list user@zookeeper.apache.org Received: (qmail 42768 invoked by uid 99); 13 Jun 2018 11:42:26 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 13 Jun 2018 11:42:26 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 23C50C1BD1 for ; Wed, 13 Jun 2018 11:42:26 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.879 X-Spam-Level: * X-Spam-Status: No, score=1.879 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=cloudera.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id cy6db1Q-80yo for ; Wed, 13 Jun 2018 11:42:24 +0000 (UTC) Received: from mail-ot0-f171.google.com (mail-ot0-f171.google.com [74.125.82.171]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 5C5165F3BD for ; Wed, 13 Jun 2018 11:42:24 +0000 (UTC) Received: by mail-ot0-f171.google.com with SMTP id 92-v6so2558308otw.9 for ; Wed, 13 Jun 2018 04:42:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cloudera.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=n5FFqm7m6za7ZZC7iNsnpiUHRt9MH7+xGjl+jj57f1s=; b=gp7QlsjJaD9/wo5GJ2ZhlSE2t3PfWVQLgqwGJ9PdU3kyq+ApVJ2s9shZ3qng1XxGtK mFWfMg2+/RhIs0AHzOQTAapjF6OBI15UBuWsayyOmCjOkO+7MO0X+AJ2A2OWXaB2R4jv mwRB3Pwn/hvybttCQYPuTVC2e+thRVRg9UQAxpkPeRrjO5XPZpG4gje0tjbrYkIFCElv Yciwd0xSD7jqfiQ/pYaFMEMpY2eoP3zSFu4Y/aMrNZC9JxnOwf/EWyL7IIJsvqtcoNwA RPbknip8g8mIdFS73Ce/uNOlyz+3EGYAFZWNI0/qQ8GnDXma+pIlns0SpetYHKrmtAic GviQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=n5FFqm7m6za7ZZC7iNsnpiUHRt9MH7+xGjl+jj57f1s=; b=PhWkyu/9HU8IeDh9UdsRF0yGtIAHvjVc1XgelWkiyIr/C2FN9wuh2UrF1H+nHN2mYc iRy/L0IyKWBa90O26kCj27bxGXaZ4tJNBl+DxntYZAxpUE+f5vC1V2svuhP+YXwK4Qat 6Fe27OX/fdDg2h6BvIc8wutLqk0CJa3X0OQBxg004c5ZotzQ2X/4Wd5mLf7oU7zFifGU //Pl8iKIecnjjUaIZRoGuc7yWl5b+5lfdhmpXWp6ymGb/l7BkGwUkJf4SK0hurkIAyok Jnk5UGMvS/OqoW7dxLbiRRryGvsiJZFZ6yg0b0UDYeMtNLIGE92irnbteyibqrT1IWow A32A== X-Gm-Message-State: APt69E1SPK+zBuNeLvVbY1z40waDllFCucrRPzt+h7exvPEoOfd0YCxr 6zQpIu1Xq8N68TqU219kVfPugopJHaQwt++65l+pvQ== X-Google-Smtp-Source: ADUXVKIzpDRBgav1LFCi//PwDT2qsa9R8f5z0rXfK4vk1kyTaoQ9RXL2mxkoMMplQZ6JWaY4pprMm8cdu8Ubeg4q7JU= X-Received: by 2002:a9d:7311:: with SMTP id e17-v6mr2715500otk.162.1528890143569; Wed, 13 Jun 2018 04:42:23 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a9d:fe9:0:0:0:0:0 with HTTP; Wed, 13 Jun 2018 04:42:23 -0700 (PDT) In-Reply-To: References: From: Andor Molnar Date: Wed, 13 Jun 2018 13:42:23 +0200 Message-ID: Subject: Re: Kafka Failing to start due to existing ID To: user@zookeeper.apache.org Content-Type: multipart/alternative; boundary="000000000000e8ab06056e8479b4" --000000000000e8ab06056e8479b4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Harish, I see 2 things which need to be clarified here: 1. ZooKeeper session dies in 2 cases only: when client explicitly closes the session (which is *not* equivalent to disconnection) or session timeout expires, 2. If quorum is not present, there'll be no updates committed and clients are rejected to connect, so Kafka shouldn't be able to use the cluster. Similarly, when quorum comes back online, ZooKeeper will continue operating normally: it receives client connections, performs updates and expire sessions if necessary. I still believe therefore that your Kafka setup doesn't properly cleanup znodes for some reason, but I'm not a Kafka expert. Regards, Andor On Wed, Jun 13, 2018 at 12:34 AM, harish lohar wrote: > Exactly , so in a case where there is jo quotum and no update can be made= , > is there a way yo stop kafka failing to start. > > One way is to cleanup kafka related znodes after bringing up quorum and > then starting kafka. > > I was looking to avoid this. > > > On Tue, Jun 12, 2018 at 4:59 PM Brian Lininger > wrote: > > > Hi Harish, > > I think I see what may be the problem for you. Based on your initial > > description (6 ZK nodes, 3 down) I think the problem is that you no > longer > > have a quorum. When a Zookeeper cluster is running, updates (i.e. > removing > > znodes) can only occur when Zookeeper has a quorum, which 50.1% of the > > configured Zookeeper nodes. If I understand correctly, then in your ca= se > > you have 6 Zookeeper nodes configured but 3 are down. This means that > you > > only have 50.0% of the Zookeeper cluster working, and thus Zookeeper do= es > > not have a quorum so no updates can be made. I don't know much about t= he > > new TTL feature in 3.5, but my assumption is that it works on this same > > principle which is that no updates can be made to the cluster's znodes > when > > there is no quorum. The same applies to the 3 Zookeeper node cluster, > you > > must have 2 nodes running to form a quorum and allow any updates to > occur. > > > > Please correct me if I missed something.... > > > > Thanks, > > Brian > > > > > > On Tue, Jun 12, 2018 at 1:33 PM, harish lohar wrote= : > > > >> ---------- Forwarded message --------- > >> From: harish lohar > >> Date: Tue, Jun 12, 2018 at 3:26 PM > >> Subject: Re: Kafka Failing to start due to existing ID > >> To: > >> > >> > >> Hi Andor, > >> > >> Thanks for your reply. > >> > >> This issue is irrespective of number of nodes, even should be seen wit= h > 3 > >> Node cluster as well. > >> > >> Actually kafka has session_timeout config , but that seems to be in > effect > >> only if zookeeper cluster is up i.e. if kafka goes down when zookeeper > >> cluster is up. > >> > >> Now let's say if 2 nodes of Zookeeper cluster is down , and then if > kafka > >> connected to 3rd Zookeeper Node goes down zookeeper cluster doesn't > >> refresh > >> the session for Kafka connected to 3rd Node. > >> > >> So when other Node comes up and zookeeper cluster becomes available it > >> doesn't delete the id of the kafka which went down when zookeeper > cluster > >> was down. > >> > >> Regarding TTL I have already enquired the kafka forum and awaiting > reply. > >> > >> Ideally once zookeper cluster is up , it should delete the kafka broke= r > >> id's which are not connected which doesn't seem to be happening > >> > >> I hope I am making some sense :) > >> > >> Thanks > >> harish > >> > >> > >> > >> On Tue, Jun 12, 2018 at 2:59 PM Andor Moln=C3=A1r w= rote: > >> > >> > Hi Harish, > >> > > >> > > >> > I have a few questions to get some insight about your issue. > >> > > >> > 1. Why do run ZooKeeper with 6 nodes while odd number of nodes are > >> > recommended (not an issue really, just for curiousity), > >> > > >> > 2. Does Kafka support ZK 3.5+ with TTL nodes? > >> > > >> > I think this is more of a Kafka question, but afaik Kafka doesn't ru= n > >> and > >> > cannot take advantage of 3.5 only features of ZK. Maybe I'm wrong, > but I > >> > think it has some cleanup mechanism to delete expired broker ids or > you > >> > must wait for the session to expire. > >> > > >> > > >> > Regards, > >> > > >> > Andor > >> > > >> > > >> > > >> > On 06/12/2018 04:39 PM, harish lohar wrote: > >> > > >> > Hi All, > >> > > >> > Need help regarding below scenario if any configuration is available > to > >> > help. > >> > > >> > I have cluster of 6 nodes > >> > 3 Nodes are stopped and brought up again, kafka fails to restart > since > >> > broker ID are still present in zookeeper znode /broker/ids/ > >> > > >> > Since the cluster goes down after removing 3 Nodes , session timeout > >> > doesn't happen. > >> > > >> > Though i am aware about TTL feature in zookeeper , but how to make > sure > >> > kafka creates znodes with TTL > >> > > >> > Thanks > >> > Harish > >> > > >> > > >> > > >> > > >> > > > > > > > > -- > > > > [image: Veeva Systems - Zinc Team] > > > > *Brian Lininger* > > Technical Architect, Infrastructure & Search > > *Veeva Systems * > > brian.lininger@veeva.com > > www.veeva.com > > > > *This email and the information it contains are intended for the intend= ed > > recipient only, are confidential and may be privileged information exem= pt > > from disclosure by law.* > > *If you have received this email in error, please notify us immediately > by > > reply email and delete this message from your computer.* > > *Please do not retain, copy or distribute this email.* > > > --000000000000e8ab06056e8479b4--