hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-5479) NameNode should not send empty block replication request to DataNode
Date Thu, 12 Mar 2009 21:22:50 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-5479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Hairong Kuang updated HADOOP-5479:

    Attachment: numTransfers.patch

This patch has three changes:
# NameNode interprets numOfTransfers as numOfBlocks to be replicated. The current code interprets
it as numOfTargets to be replicated. This change is made in DatanodeDescriptor#BlockTargetPair.poll().
This prevents empty replication requests as well as empty recover requests.
# The number of targets to be chosen is not capped by the number of transfers. Again NameNode
should not treat the number of transfers as the number of targets.
# The third change is not directly related to this issue. But I saw this happen when I debugged
this issue. The current code moves a block to the pending replication queue only when it reaches
its replication factor. This sometimes causes over-replication because it does not track all
pending replications. This patch adds a block to the pending replication queue whenever there
is one replication scheduled for this block.

> NameNode should not send empty block replication request to DataNode
> --------------------------------------------------------------------
>                 Key: HADOOP-5479
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5479
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>            Priority: Blocker
>             Fix For: 0.19.2, 0.20.0, 0.21.0
>         Attachments: numTransfers.patch
> On our production clusters, we occasionally see that NameNode sends an empty block replication
request to DataNode on  every heartbeat, thus blocking this DataNode from replicating or deleting
any block.
> This is partly caused by DataNode sending a wrong number of replications in progress
which will be fixed by HADOOP-5465. There is also a flaw at the NameNode side. NameNode should
not interpret the number of replications in progress as the number of targets since replication
is done through a pipeline. It also should make sure that no empty replication request is
sent to DataNode.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message