hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hairong Kuang (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5034) NameNode should send both both replication and deletion requests to DataNode in one reply to a heartbeat
Date Fri, 16 Jan 2009 23:09:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664730#action_12664730
] 

Hairong Kuang commented on HADOOP-5034:
---------------------------------------

I do not think fine-tuning the limit will solve the problem. What happened was that block
replication starved block deletion. Block deletion was not observed at all.  When I ReplicationMonitor
code, I found out that block deletion does not get scheduled for any datanode even if there
is only one replication work scheduled for the whole cluster. This explains why no block deletion
was observed at all.

> NameNode should send both both replication and deletion requests to DataNode in one reply
to a heartbeat
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-5034
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5034
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
>
>
> Currently NameNode favors block replication requests over deletion requests. On reply
to a heartbeat, NameNode does not send a block deletion request unless there is no block replication
request. 
> This brings a problem when a near-full cluster loses a bunch of DataNodes. In react to
the DataNode loss, NameNode starts to replicate blocks. However, replication takes a lot of
cpu and a lot of replications fail because of the lack of disk space. So the administrator
tries to delete some DFS files to free up space. However, block deletion requests get delayed
for very long time because it takes a long time to drain the block replication requests for
most DataNodes.
> I'd like to propose to let NameNode to send both replication requests and deletion requests
to DataNodes in one reply to a heartbeat. This also implies that the replication monitor should
schedule both replication and deletion work in one iteration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message