zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Benedict Jin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-1277) servers stop serving when lower 32bits of zxid roll over
Date Wed, 24 May 2017 02:35:05 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-1277?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16022229#comment-16022229
] 

Benedict Jin commented on ZOOKEEPER-1277:
-----------------------------------------

I created a new jira ZOOKEEPER-2789 to discuss reassign `ZXID` for solving 32bit overflow
problem. Could you please offer some advice for it?

> servers stop serving when lower 32bits of zxid roll over
> --------------------------------------------------------
>
>                 Key: ZOOKEEPER-1277
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-1277
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: server
>    Affects Versions: 3.3.3
>            Reporter: Patrick Hunt
>            Assignee: Patrick Hunt
>            Priority: Critical
>             Fix For: 3.3.5, 3.4.4, 3.5.0
>
>         Attachments: ZOOKEEPER-1277_br33.patch, ZOOKEEPER-1277_br33.patch, ZOOKEEPER-1277_br33.patch,
ZOOKEEPER-1277_br33.patch, ZOOKEEPER-1277_br34.patch, ZOOKEEPER-1277_br34.patch, ZOOKEEPER-1277_trunk.patch,
ZOOKEEPER-1277_trunk.patch
>
>
> When the lower 32bits of a zxid "roll over" (zxid is a 64 bit number, however the upper
32 are considered the epoch number) the epoch number (upper 32 bits) are incremented and the
lower 32 start at 0 again.
> This should work fine, however in the current 3.3 branch the followers see this as a
NEWLEADER message, which it's not, and effectively stop serving clients. Attached clients
seem to eventually time out given that heartbeats (or any operation) are no longer processed.
The follower doesn't recover from this.
> I've tested this out on 3.3 branch and confirmed this problem, however I haven't tried
it on 3.4/3.5. It may not happen on the newer branches due to ZOOKEEPER-335, however there
is certainly an issue with updating the "acceptedEpoch" files contained in the datadir. (I'll
enter a separate jira for that)



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Mime
View raw message