hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tsz Wo (Nicholas), SZE (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4702) Failed block replication leaves an incomplete block in receiver's tmp data directory
Date Wed, 26 Nov 2008 18:54:44 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4702?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12651103#action_12651103
] 

Tsz Wo (Nicholas), SZE commented on HADOOP-4702:
------------------------------------------------

Block replication and block creation should be different: block creation allows partial block
but block replication should be atomic, either replicate the entire block or do nothing.

> Failed block replication leaves an incomplete block in receiver's tmp data directory
> ------------------------------------------------------------------------------------
>
>                 Key: HADOOP-4702
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4702
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.17.2
>            Reporter: Hairong Kuang
>             Fix For: 0.20.0
>
>
> When a failure occurs while replicating a block from a source DataNode to a target DataNode,
the target node keeps an incomplete on-disk copy of the block in its temp data directory and
an in-memory copy of the block in ongoingCreates queue. This causes two problems:
> 1. Since this block is not (should not) be finalized, NameNode is not aware of the existence
of this incomplete block. It may schedule replicating the same block to this node again, which
will fail with a message: "Block XX has already been started (though not completed), and thus
cannot be created."
> 2. Restarting the datanode moves the blocks under the temp data directory to be valid
blocks, thus introduces corrupted blocks into HDFS. Sometimes those corrupted blocks stay
in the system undetected if it happens that the partial block and its checksums match.
> A failed block replication should clean up both the in-memory & on-disk copies of
the incomplete block.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message