[ https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13091544#comment-13091544
]
Mahadev konar commented on HDFS-395:
------------------------------------
Hairong,
Can you please add the fix version to the jira?
> DFS Scalability: Incremental block reports
> ------------------------------------------
>
> Key: HDFS-395
> URL: https://issues.apache.org/jira/browse/HDFS-395
> Project: Hadoop HDFS
> Issue Type: Sub-task
> Components: data-node, name-node
> Reporter: dhruba borthakur
> Assignee: Tomasz Nykiel
> Attachments: blockReportPeriod.patch, explicitAcks.patch-3, explicitAcks.patch-4,
explicitAcks.patch-5, explicitAcks.patch-6, explicitDeleteAcks.patch
>
>
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 blocks and sends
a block report to the namenode once every hour. This means that the namenode processes a block
report once every 2 seconds. Each block report contains all blocks that the datanode currently
hosts. This makes the namenode compare a huge number of blocks that practically remains the
same between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of a full block
report) be incremental. This will make the namenode process only those blocks that were added/deleted
in the last period.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
|