zookeeper-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hadoop QA (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (ZOOKEEPER-2033) zookeeper follower fails to start after a restart immediately following a new epoch
Date Fri, 01 May 2015 01:57:06 GMT

    [ https://issues.apache.org/jira/browse/ZOOKEEPER-2033?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14522647#comment-14522647
] 

Hadoop QA commented on ZOOKEEPER-2033:
--------------------------------------

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12667913/ZOOKEEPER-2033-3.4.patch
  against trunk revision 1676359.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 3 new or modified tests.

    -1 patch.  The patch command could not apply the patch.

Console output: https://builds.apache.org/job/PreCommit-ZOOKEEPER-Build/2661//console

This message is automatically generated.

> zookeeper follower fails to start after a restart immediately following a new epoch
> -----------------------------------------------------------------------------------
>
>                 Key: ZOOKEEPER-2033
>                 URL: https://issues.apache.org/jira/browse/ZOOKEEPER-2033
>             Project: ZooKeeper
>          Issue Type: Bug
>          Components: quorum
>    Affects Versions: 3.4.6
>            Reporter: Asad Saeed
>            Assignee: Asad Saeed
>             Fix For: 3.4.7
>
>         Attachments: ZOOKEEPER-2033-3.4.patch, ZOOKEEPER-2033.patch
>
>
> The following issue was seen when adding a new node to a zookeeper cluster.
> Reproduction steps
> 1. Create a 2 node ensemble. Write some keys.
> 2. Add another node to the ensemble, by modifying the config. Restarting entire cluster.
> 3. Restart the new node before writing any new keys.
> What occurs is that the new node gets a SNAP from the newly elected leader, since it
is too far behind. The zxid for this snapshot is from the new epoch but that is not in the
committed log cache.
> On restart of this new node. The follower sends the new epoch zxid. The leader looks
at it's maxCommitted logs, and sees that it is not the newest epoch, and therefore sends a
TRUNC.
> The follower sees the TRUNC but it only has a snapshot, so it cannot truncate!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message