hadoop-hdfs-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Created: (HDFS-550) DataNode restarts may introduce corrupt/duplicated/lost replicas when handling detached replicas
Date Mon, 17 Aug 2009 23:18:14 GMT
DataNode restarts may introduce corrupt/duplicated/lost replicas when handling detached replicas
------------------------------------------------------------------------------------------------

                 Key: HDFS-550
                 URL: https://issues.apache.org/jira/browse/HDFS-550
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: data-node
    Affects Versions: 0.21.0
            Reporter: Hairong Kuang
            Assignee: Hairong Kuang
            Priority: Blocker
             Fix For: Append Branch


Current trunks calls first unlinks a finalized replica before appending to this block. Unlink
is done by temporally copying the block file in the "current" subtree to a directory called
"detach" under the volume's daa directory and then copies it back when unlink succeeds. On
datanode restarts, datanodes recover faied unlink by copying replicas under "detach" to "current".

There are two bugs with this implementation:
1. The "detach" directory does not include in a snapshot. so rollback will cause the "detaching"
replicas to be lost.
2. After a replica is copied to the "detach" directory, the information of its original location
is lost. The current implementation erroneously assumes that the replica to be unlinked is
under "current". This will make two instances of replicas with the same block id coexist in
a datanode. Also if the replica under "detach" is corrupt, the corrupt replica is moved to
"current" without being detected, polluting datanode data. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message