hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Todd Lipcon (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HDFS-4799) Corrupt replica can be prematurely removed from corruptReplicas map
Date Mon, 06 May 2013 00:14:15 GMT
Todd Lipcon created HDFS-4799:

             Summary: Corrupt replica can be prematurely removed from corruptReplicas map
                 Key: HDFS-4799
                 URL: https://issues.apache.org/jira/browse/HDFS-4799
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: namenode
    Affects Versions: 2.0.4-alpha
            Reporter: Todd Lipcon
            Assignee: Todd Lipcon
            Priority: Blocker

We saw the following sequence of events in a cluster result in losing the most recent genstamp
of a block:
- client is writing to a pipeline of 3
- the pipeline had nodes fail over some period of time, such that it left 3 old-genstamp replicas
on the original three nodes, having recruited 3 new replicas with a later genstamp.
-- so, we have 6 total replicas in the cluster, three with old genstamps on downed nodes,
and 3 with the latest genstamp
- cluster reboots, and the nodes with old genstamps blockReport first. The replicas are correctly
added to the corrupt replicas map since they have a too-old genstamp
- the nodes with the new genstamp block report. When the latest one block reports, chooseExcessReplicates
is called and incorrectly decides to remove the three good replicas, leaving only the old-genstamp

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message