hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5034) NameNode should send both both replication and deletion requests to DataNode in one reply to a heartbeat
Date Fri, 16 Jan 2009 19:22:59 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12664635#action_12664635

Raghu Angadi commented on HADOOP-5034:

> I think we should first try to fine-tune the number of blocks we send for deletion. Currently
we send 100 [...]

I think this limit is related but different issue. Even when it was implemented it was supposed
to be a work around for how DN handles deletion. We should either remove or have a very large
limit at NN and let DN handle deleting large number of blocks properly (say in a separate
thread from heart beat thread). This was fix was proposed quite a few times but we didn't
fix it. Trying to fine tune it only prolongs the problem.

> If fine-tuning will not solve the problem we can go on with the modifications to the

It does not look like changing this limit won't fix the issue since NN never get to send any
block to delete. Logically I don't see any reason why NN can not send both replication and
deletion requests in the same response to DN.

> NameNode should send both both replication and deletion requests to DataNode in one reply
to a heartbeat
> --------------------------------------------------------------------------------------------------------
>                 Key: HADOOP-5034
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5034
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Hairong Kuang
>             Fix For: 0.21.0
> Currently NameNode favors block replication requests over deletion requests. On reply
to a heartbeat, NameNode does not send a block deletion request unless there is no block replication
> This brings a problem when a near-full cluster loses a bunch of DataNodes. In react to
the DataNode loss, NameNode starts to replicate blocks. However, replication takes a lot of
cpu and a lot of replications fail because of the lack of disk space. So the administrator
tries to delete some DFS files to free up space. However, block deletion requests get delayed
for very long time because it takes a long time to drain the block replication requests for
most DataNodes.
> I'd like to propose to let NameNode to send both replication requests and deletion requests
to DataNodes in one reply to a heartbeat. This also implies that the replication monitor should
schedule both replication and deletion work in one iteration.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message