zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Cesar Stuardo (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2865) Reconfig Causes Inconsistent Configuration file among the nodes
Date Fri, 01 Sep 2017 19:31:00 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16151051#comment-16151051
] 

Cesar Stuardo commented on ZOOKEEPER-2865:
------------------------------------------

Hello Alexander,

In your first comment, you state that
----
But what's required is for the cluster to be able to recover from this state - the server
that didn't get the commit in your scenario should find out about the new config and eventually
join the cluster. If that doesn't happen then that potentially is a bug, but its not clear
from the description here.
----

What do you mean by this? In our scenario, the node wont be able to recover since the nodes
that it knows at startup are not listening in the same ports anymore, thus wont get updated.
The only solution is admin intervention.


> Reconfig Causes Inconsistent Configuration file among the nodes
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2865
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2865
>             Project: ZooKeeper
>          Issue Type: Improvement
>          Components: documentation
>    Affects Versions: 3.5.3
>            Reporter: Jeffrey F. Lukman
>            Assignee: Alexander Shraer
>            Priority: Trivial
>             Fix For: 3.5.4, 3.6.0
>
>         Attachments: ZK-2865.pdf
>
>
> When we run our Distributed system Model Checking (DMCK) in ZooKeeper v3.5.3
> by following the workload in ZK-2778:
> - initially start 2 ZooKeeper nodes
> - start 3 new nodes
> - do a reconfiguration (the complete reconfiguration is attached in the document)
> We think our DMCK found this following bug:
> - while one of the just joined nodes has not received the latest configuration update

> (called as node X), the initial leader node closed its port, 
> therefore causing the node X to be isolated.
> For complete information of the bug, please see the document that is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message