zookeeper-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From harish lohar <hklo...@gmail.com>
Subject Re: Kafka Failing to start due to existing ID
Date Mon, 18 Jun 2018 15:55:03 GMT
Just to update everyone, finally i was able to root cause the issue and it
seems to be

https://issues.apache.org/jira/browse/ZOOKEEPER-2901

which is related to node id  being > 127.

it's fixed in 3.5.4-beta and it works fine.

Thanks
Harish


On Wed, Jun 13, 2018 at 7:42 AM Andor Molnar <andor@cloudera.com> wrote:

> Hi Harish,
>
> I see 2 things which need to be clarified here:
>
> 1. ZooKeeper session dies in 2 cases only: when client explicitly closes
> the session (which is *not* equivalent to disconnection) or session timeout
> expires,
> 2. If quorum is not present, there'll be no updates committed and clients
> are rejected to connect, so Kafka shouldn't be able to use the cluster.
>
> Similarly, when quorum comes back online, ZooKeeper will continue operating
> normally: it receives client connections, performs updates and expire
> sessions if necessary.
>
> I still believe therefore that your Kafka setup doesn't properly cleanup
> znodes for some reason, but I'm not a Kafka expert.
>
> Regards,
> Andor
>
>
>
>
> On Wed, Jun 13, 2018 at 12:34 AM, harish lohar <hklohar@gmail.com> wrote:
>
> > Exactly , so in a case where there is jo quotum and no update can be
> made ,
> > is there a way yo stop kafka failing to start.
> >
> > One way is to cleanup kafka related znodes  after bringing up quorum and
> > then starting kafka.
> >
> > I was looking to avoid this.
> >
> >
> > On Tue, Jun 12, 2018 at 4:59 PM Brian Lininger <brian.lininger@veeva.com
> >
> > wrote:
> >
> > > Hi Harish,
> > > I think I see what may be the problem for you.  Based on your initial
> > > description (6 ZK nodes, 3 down) I think the problem is that you no
> > longer
> > > have a quorum.  When a Zookeeper cluster is running, updates (i.e.
> > removing
> > > znodes) can only occur when Zookeeper has a quorum, which 50.1% of the
> > > configured Zookeeper nodes.  If I understand correctly, then in your
> case
> > > you have 6 Zookeeper nodes configured but 3 are down.  This means that
> > you
> > > only have 50.0% of the Zookeeper cluster working, and thus Zookeeper
> does
> > > not have a quorum so no updates can be made.  I don't know much about
> the
> > > new TTL feature in 3.5, but my assumption is that it works on this same
> > > principle which is that no updates can be made to the cluster's znodes
> > when
> > > there is no quorum.  The same applies to the 3 Zookeeper node cluster,
> > you
> > > must have 2 nodes running to form a quorum and allow any updates to
> > occur.
> > >
> > > Please correct me if I missed something....
> > >
> > > Thanks,
> > > Brian
> > >
> > >
> > > On Tue, Jun 12, 2018 at 1:33 PM, harish lohar <hklohar@gmail.com>
> wrote:
> > >
> > >> ---------- Forwarded message ---------
> > >> From: harish lohar <hklohar@gmail.com>
> > >> Date: Tue, Jun 12, 2018 at 3:26 PM
> > >> Subject: Re: Kafka Failing to start due to existing ID
> > >> To: <andor@apache.org>
> > >>
> > >>
> > >> Hi Andor,
> > >>
> > >> Thanks for your reply.
> > >>
> > >> This issue is irrespective of number of nodes, even should be seen
> with
> > 3
> > >> Node cluster as well.
> > >>
> > >> Actually kafka has session_timeout config , but that seems to be in
> > effect
> > >> only if zookeeper cluster is up i.e. if kafka goes down when zookeeper
> > >> cluster is up.
> > >>
> > >> Now let's say if 2 nodes of Zookeeper cluster is down , and then if
> > kafka
> > >> connected to 3rd Zookeeper Node goes down zookeeper cluster doesn't
> > >> refresh
> > >> the session for Kafka connected to 3rd Node.
> > >>
> > >> So when other Node comes up and zookeeper cluster becomes available it
> > >> doesn't delete the id of the kafka which went down when zookeeper
> > cluster
> > >> was down.
> > >>
> > >> Regarding TTL I have already enquired the kafka forum and awaiting
> > reply.
> > >>
> > >> Ideally once zookeper cluster is up , it should delete the kafka
> broker
> > >> id's which are not connected which doesn't seem to be happening
> > >>
> > >> I hope I am making some sense :)
> > >>
> > >> Thanks
> > >> harish
> > >>
> > >>
> > >>
> > >> On Tue, Jun 12, 2018 at 2:59 PM Andor Molnár <andor@apache.org>
> wrote:
> > >>
> > >> > Hi Harish,
> > >> >
> > >> >
> > >> > I have a few questions to get some insight about your issue.
> > >> >
> > >> > 1. Why do run ZooKeeper with 6 nodes while odd number of nodes are
> > >> > recommended (not an issue really, just for curiousity),
> > >> >
> > >> > 2. Does Kafka support ZK 3.5+ with TTL nodes?
> > >> >
> > >> > I think this is more of a Kafka question, but afaik Kafka doesn't
> run
> > >> and
> > >> > cannot take advantage of 3.5 only features of ZK. Maybe I'm wrong,
> > but I
> > >> > think it has some cleanup mechanism to delete expired broker ids or
> > you
> > >> > must wait for the session to expire.
> > >> >
> > >> >
> > >> > Regards,
> > >> >
> > >> > Andor
> > >> >
> > >> >
> > >> >
> > >> > On 06/12/2018 04:39 PM, harish lohar wrote:
> > >> >
> > >> > Hi All,
> > >> >
> > >> > Need help regarding below scenario if any configuration is available
> > to
> > >> > help.
> > >> >
> > >> > I have cluster of 6 nodes
> > >> > 3 Nodes are stopped and  brought up again, kafka fails to restart
> > since
> > >> > broker ID are still present in zookeeper znode /broker/ids/
> > >> >
> > >> > Since the cluster goes down after removing 3 Nodes , session timeout
> > >> > doesn't happen.
> > >> >
> > >> > Though i am aware about TTL feature in zookeeper , but how to make
> > sure
> > >> > kafka creates znodes with TTL
> > >> >
> > >> > Thanks
> > >> > Harish
> > >> >
> > >> >
> > >> >
> > >> >
> > >>
> > >
> > >
> > >
> > > --
> > >
> > > [image: Veeva Systems - Zinc Team]
> > >
> > > *Brian Lininger*
> > > Technical Architect, Infrastructure & Search
> > > *Veeva Systems *
> > > brian.lininger@veeva.com
> > > www.veeva.com
> > >
> > > *This email and the information it contains are intended for the
> intended
> > > recipient only, are confidential and may be privileged information
> exempt
> > > from disclosure by law.*
> > > *If you have received this email in error, please notify us immediately
> > by
> > > reply email and delete this message from your computer.*
> > > *Please do not retain, copy or distribute this email.*
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message