zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yongcheng Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2660) acceptedEpoch and currentEpoch data inconsistency, ZK process can not start!
Date Sat, 07 Jan 2017 13:31:58 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15807507#comment-15807507
] 

Yongcheng Liu commented on ZOOKEEPER-2660:
------------------------------------------

for example,
we have 3 ZK (zk1, zk2, zk3), zk1 is down, zk2 is follower, zk3 is leader.  The currentEpoch
and acceptEpoch of zk2 and zk3 is 6.   The currentEpoch and acceptEpoch of zk1 is 5.
Then,  zk1 is start. zk1 will setAcceptedEpoch to 6 when become follwer,  in the memory, acceptedEpoch
is already 6, because run function setAcceptedEpoch. But writeLongToFile run failed, it will
lead to in the file acceptedEpoch is already 5.  Then throw abnormal, return to LOOKING, this
time zk1 will not run function setAcceptedEpoch, because his acceptedEpoch is the same as
leader(zk3), zk1 do not know his acceptedEpoch in the file already 5.  If zk1 down again,
zk1 will never start up. Because in the file, acceptedEpoch is 5, currentEpoch is 6.

> acceptedEpoch and currentEpoch data inconsistency, ZK process can not start!
> ----------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2660
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2660
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.6, 3.4.9
>         Environment: ZK: 3.4.9
>            Reporter: Yongcheng Liu
>
> 1. currentEpoch is bigger than acceptedEpoch, ZK will throw IOException when start loadDataBase.
> 2. function bug. In function setAcceptedEpoch and setCurrentEpoch, it is modify memory
variable first, then write epoch to file. If write file failed, the memory has been modified.
> solution as follow:
> for example,
> 	public void setAcceptedEpoch(long e) throws IOException {
> 		acceptedEpoch = e;
> 		writeLongToFile(ACCEPTED_EPOCH_FILENAME, e);
> 	}
> need to modify as follow:
> 	public void setAcceptedEpoch(long e) throws IOException {
> 		writeLongToFile(ACCEPTED_EPOCH_FILENAME, e);
> 		acceptedEpoch = e;
> 	}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message