hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1079) DFS Scalability: optimize processing time of block reports
Date Thu, 17 May 2007 00:03:16 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12496443

dhruba borthakur commented on HADOOP-1079:

A short discussion with Sameer revealed that the case where admin/rogue-process manually deleted
some blk-xxx files cannot be dealt with by an daily block report. A proposal is to do make
the datanode compare its in-memory data structures with what is on disk. If it finds inconsistencies,
then it send s a complete block report to the namenode. 

In response to Raghu's comments: I agree that sending a block report "immediately" after a
successful heartbeat (that was preceeded by a failed heartbeat) could add more load on the
namenode. It is enough if we can "hasten" the next block report rather than sending it immediately.
I will work on this fix.

> DFS Scalability: optimize processing time of block reports
> ----------------------------------------------------------
>                 Key: HADOOP-1079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1079
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: blockReportPeriod.patch
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 blocks and sends
a block report to the namenode once every hour. This means that the namenode processes a block
report once every 2 seconds. Each block report contains all blocks that the datanode currently
hosts. This makes the namenode compare a huge number of blocks that practically remains the
same between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of a full block
report) be incremental. This will make the namenode process only those blocks that were added/deleted
in the last period.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message