zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Alexander Shraer (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2865) Reconfig Causes Inconsistent Configuration file among the nodes
Date Sat, 05 Aug 2017 07:13:02 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2865?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16115308#comment-16115308
] 

Alexander Shraer commented on ZOOKEEPER-2865:
---------------------------------------------

Thanks [~jeffreyflukman]! 

We don't try to guarantee that every member of the new config receives the proposal message
of a reconfiguration (only a quorum needs to ack) and don't wait until either of them receive
the COMMIT before completing the reconfig (to be compatible with other ZK operations, I didn't
want to introduce another round of message exchange).

But what's required is for the cluster to be able to recover from this state - the server
that didn't get the commit in your scenario should find out about the new config and eventually
join the cluster. If that doesn't happen then that potentially is a bug, but its not clear
from the description here.

> Reconfig Causes Inconsistent Configuration file among the nodes
> ---------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2865
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2865
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: leaderElection, quorum, server
>    Affects Versions: 3.5.3
>            Reporter: Jeffrey F. Lukman
>         Attachments: ZK-2865.pdf
>
>
> When we run our Distributed system Model Checking (DMCK) in ZooKeeper v3.5.3
> by following the workload in ZK-2778:
> - initially start 2 ZooKeeper nodes
> - start 3 new nodes
> - do a reconfiguration (the complete reconfiguration is attached in the document)
> We think our DMCK found this following bug:
> - while one of the just joined nodes has not received the latest configuration update

> (called as node X), the initial leader node closed its port, 
> therefore causing the node X to be isolated.
> For complete information of the bug, please see the document that is attached.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message