hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1349) Corrupted blocks get deleted but not replicated
Date Tue, 15 May 2007 06:55:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12495875
] 

dhruba borthakur commented on HADOOP-1349:
------------------------------------------

I will look at the code more closely, but the approach sounds pretty good. Like we discussed,
this approach still cannot solve the race condition entirely. In this approach, we ensure
that the namenode has sent out the delete-block request before attempting to allocate the
same block on the same datanode, but these requests could still get *processed* on  the datanode
out-of-order. This fix reduces the race-window to a minimum.

The other side-effect is that a client than opens the file might try a bad block replica for
a longer time (because the block does not get deleted from the blocksmap for a longer time)
but this irritant should be minor at best and can be ignored.

> Corrupted blocks get deleted but not replicated
> -----------------------------------------------
>
>                 Key: HADOOP-1349
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1349
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>         Assigned To: Hairong Kuang
>             Fix For: 0.14.0
>
>         Attachments: blockInvalidate.patch
>
>
> When I test the patch to HADOOP-1345 on a two node dfs cluster, I see that dfs correctly
delete the corrupted replica and successfully retry reading from the other correct replica,
but the block does not get replicated. The block remains with only 1 replica until the next
block report comes in.
> In my testcase, since the dfs cluster has only 2 datanodes, the target of replication
is the same as the target of block invalidation.  After poking the logs, I found out that
the namenode sent the replication request before the block invalidation request. 
> This is because the namenode does not invalidate a block well. In FSNamesystem.invalidateBlock,
it first puts the invalidate request in a queue and then immediately removes the replica from
its state, which triggers the choosing a target for the block. When requests are sent back
to the target datanode as a reply to a heartbeat message, the replication requests have higher
priority than the invalidate requests.
> This problem could be solved if a namenode removes an invalidated replica from its state
only after the invalidate request is sent to the datanode.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message