zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alexander Shraer <shra...@gmail.com>
Subject Re: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file can not be updated automatically after some zookeeper servers of zk cluster are down
Date Thu, 26 Sep 2019 04:34:33 GMT
exactly, thank you Michael :)

On Wed, Sep 25, 2019 at 9:32 PM Michael Han <hanm@apache.org> wrote:

> >> There were recently a post here from someone who has implemented this
>
> Maybe this one?
>
> http://zookeeper-user.578899.n2.nabble.com/About-ZooKeeper-Dynamic-Reconfiguration-td7584271.html
>
> On Wed, Sep 25, 2019 at 9:19 PM Alexander Shraer <shralex@gmail.com>
> wrote:
>
> > There were recently a post here from someone who has implemented this,
> but
> > I couldn't find it for some reason.
> >
> > Essentially I think that you'd need to monitor the "health" and
> > connectivity of servers to the leader, and issue reconfig commands to
> > remove them when you suspect that they're down or add them back when you
> > think they're up.
> > Notice that you always have to have at least a quorum of the ensemble, so
> > issuing a reconfig command if a quorum is lost (or any other command)
> won't
> > work.
> > You could use the information exposed in ZK's 4 letter commands to decide
> > whether you think a server is up and connected to the quorum or down.
> > Ideally we could also use the leader's view on who is connected
> > but it doesn't look like this is being exposed right now. You can also
> > periodically issue test read/write operations on various servers to check
> > if they're really operational
> >
> >
> https://github.com/apache/zookeeper/blob/1ca627b5a3105d80ed4d851c6e9f1a1e2ac7d64a/zookeeper-docs/src/main/resources/markdown/zookeeperAdmin.md#sc_4lw
> >
> > As accurate failure detection is impossible in async. systems, you'll
> need
> > to decide how sensitive you are to potential failures vs false
> suspicions.
> >
> > Hope this helps...
> >
> > Alex
> >
> > On Wed, Sep 25, 2019 at 6:00 PM Gao,Wei <Wei.Gao@arcserve.com> wrote:
> >
> > > Hi Alexander Shraer,
> > >  Could you please tell me how to implement automation on top?
> > > Thank you very much!
> > >
> > > -----Original Message-----
> > > From: Alexander Shraer (Jira) <jira@apache.org>
> > > Sent: Thursday, September 26, 2019 1:27 AM
> > > To: issues@zookeeper.apache.org
> > > Subject: [jira] [Commented] (ZOOKEEPER-3556) Dynamic configuration file
> > > can not be updated automatically after some zookeeper servers of zk
> > cluster
> > > are down
> > >
> > >
> > >     [
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16937925-23comment-2D16937925&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=XxgusqUbHgFrxTfTTcYuxMWxol3W-1dJ7WVzUqh1HAE&e=
> > > ]
> > >
> > > Alexander Shraer commented on ZOOKEEPER-3556:
> > > ---------------------------------------------
> > >
> > > The described behavior is not a bug – currently reconfiguration
> requires
> > > explicit action by an operator. One could implement automation on top.
> We
> > > should consider this as a feature, since it sounds like several
> adopters
> > > have implemented such automation. Perhaps one of them could contribute
> > this
> > > upstream.
> > >
> > > > Dynamic configuration file can not be updated automatically after
> some
> > > > zookeeper servers of zk cluster are down
> > > >
> ----------------------------------------------------------------------
> > > > -----------------------------------------
> > > >
> > > >                 Key: ZOOKEEPER-3556
> > > >                 URL:
> > >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_ZOOKEEPER-2D3556&d=DwIFaQ&c=ZmK7amRlbztwfC_NTU_hNw&r=bTmnMF5RGYcfg4qOcKQAYjkGGUtOB2jR22ryrk8hNWk&m=UNFnO3kfjtUL8Jievmh9VMXf_nTLKBCfuJsaxe6FshU&s=NQvX26JbBDNMmEtQhirmYk7ELe46vCjn4kbm1VqcNsA&e=
> > > >             Project: ZooKeeper
> > > >          Issue Type: Wish
> > > >          Components: java client
> > > >    Affects Versions: 3.5.5
> > > >            Reporter: Steven Chan
> > > >            Priority: Major
> > > >   Original Estimate: 12h
> > > >  Remaining Estimate: 12h
> > > >
> > > > *I encountered a problem which blocks my development of load balance
> > > > using ZooKeeper 3.5.5.*
> > > >    *Actually, I have a ZooKeeper cluster which comprises of five zk
> > > > servers. And the dynamic configuration file is as follows:*
> > > >  **
> > > > {color:#FF0000}
> > > > *server.1=zk1:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.5=zk5:2888:3888:participant;0.0.0.0:2181*{color}
> > > >  **
> > > >   *The zk cluster can work fine if every member works normally.
> > > > However, if say two of them are suddenly down without previously
> being
> > > > notified,* *the dynamic configuration file shown above will not be
> > > > synchronized dynamically, which leads to the zk cluster fail to work
> > > > normally.*
> > > >   *As far as I am concerned, the dynamic configuration file should be
> > > > modified to this if server 1 and server 5 are down suddenly as
> > > > follows:* {color:#FF0000}
> > > > *server.2=zk2:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.3=zk3:2888:3888:participant;0.0.0.0:2181*{color}
> > > > {color:#FF0000}
> > > > *server.4=zk4:2888:3888:participant;0.0.0.0:2181*{color}
> > > > *But in this case, the dynamic configuration file will never change
> > > > automatically unless you manually revise it.*
> > > >   *I think this is a very common case which may happen at any time.
> If
> > > > so, how can we handle with it?*
> > >
> > >
> > >
> > > --
> > > This message was sent by Atlassian Jira
> > > (v8.3.4#803005)
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message