hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1079) DFS Scalability: optimize processing time of block reports
Date Wed, 16 May 2007 22:41:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496425

dhruba borthakur commented on HADOOP-1079:

The BlockReport exists because of the following reasons. 

1. User/admin manually deleted a bunch of blk-xxxx files in the datanode.
     This happens rarely and a daily-block-report is good enough to rectify this situation
2. A heartbeat response from the namenode is lost. 
      When this occurs, the datanode will remember this condition. The next block report will
occur immediately after the next successfull heartbeat.

Given the two above case, we can make the default block report period to be 1 day. This will
reduce the CPU load on the namenode tremendously, especially on large clusters.

> DFS Scalability: optimize processing time of block reports
> ----------------------------------------------------------
>                 Key: HADOOP-1079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1079
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 blocks and sends
a block report to the namenode once every hour. This means that the namenode processes a block
report once every 2 seconds. Each block report contains all blocks that the datanode currently
hosts. This makes the namenode compare a huge number of blocks that practically remains the
same between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of a full block
report) be incremental. This will make the namenode process only those blocks that were added/deleted
in the last period.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message