hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports
Date Fri, 15 Jul 2011 16:59:00 GMT

    [ https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13066071#comment-13066071

Suresh Srinivas commented on HDFS-395:

> But for an excessive replica, it remains in the block map and excessiveBlockMap until
an ack is back. They are the ones that need explicit acknowledgment. 

I know that for deleted files, when a previously deleted replica is reported by datanode to
namenode, NN can again delete the replicas because the file does not exist. But I wonder why
we do not remove excess replica also from the map on scheduling deletion.

However, this could come very handy in HA implementation. Currently all namespace operations
goes to standby through editlog. However having the delete acks creates a channel to report
block deletions also to standby. So I am +1 on delete acks from the perspective of HA.

Directory scanner should use the mechanism in this jira to send difference between in memory
block map and the disk. This could be done in another jira.

> DFS Scalability: Incremental block reports
> ------------------------------------------
>                 Key: HDFS-395
>                 URL: https://issues.apache.org/jira/browse/HDFS-395
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: blockReportPeriod.patch, explicitDeleteAcks.patch
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 blocks and sends
a block report to the namenode once every hour. This means that the namenode processes a block
report once every 2 seconds. Each block report contains all blocks that the datanode currently
hosts. This makes the namenode compare a huge number of blocks that practically remains the
same between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of a full block
report) be incremental. This will make the namenode process only those blocks that were added/deleted
in the last period.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message