hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Raghu Angadi (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-1079) DFS Scalability: optimize processing time of block reports
Date Thu, 08 Mar 2007 04:45:24 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-1079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12479224
] 

Raghu Angadi commented on HADOOP-1079:
--------------------------------------


Namenode already knows the blocks added and deleted between block reports. I think block report
exists only to catch some exceptions.

How does delta help catch differences between namenode and datanode? I mean how can two sides
be sure that they have the exact set? May be they can exchange some hash of all the block
ids and revert to normal report only if there is a mismatch. Also, this hash could be order
independent so that name updates with each block added instread of iterating over the set
when required.


> DFS Scalability: optimize processing time of block reports
> ----------------------------------------------------------
>
>                 Key: HADOOP-1079
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1079
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 blocks and sends
a block report to the namenode once every hour. This means that the namenode processes a block
report once every 2 seconds. Each block report contains all blocks that the datanode currently
hosts. This makes the namenode compare a huge number of blocks that practically remains the
same between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of a full block
report) be incremental. This will make the namenode process only those blocks that were added/deleted
in the last period.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message