hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-395) DFS Scalability: Incremental block reports
Date Thu, 14 Jul 2011 20:48:59 GMT

    [ https://issues.apache.org/jira/browse/HDFS-395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13065523#comment-13065523

Suresh Srinivas commented on HDFS-395:

>From my understanding, when a namenode removes a block:
# NN first deletes it in the block map and adds it to invalidate set for the datanodes.
# The blocks are then deleted from invalidate set, when it is sent for deletion to the datanodes.

Namenode already has deleted from it's data structures the block information. Given this I
am not sure what purpose the delete ack serves. 

I wanted to do a variant of this for the following reasons:
If accidentally files related to block replicas are deleted on the datanode, without a periodic
block reports the namenode is not aware of this loss.  Also a replica file could be truncated/modified.
The change I was thinking was in DirectoryScanner; it currently reconciles the difference
between the block information in datanode process with what is on the disk. This difference
is sent to namenode in the block report. Instead of block report, we could send this diff
to the namenode. With this we can reduce the frequency of block report.

+1 for renaming the file to target it for deletion. This addresses some of the race conditions
that Hairong brought up.

> DFS Scalability: Incremental block reports
> ------------------------------------------
>                 Key: HDFS-395
>                 URL: https://issues.apache.org/jira/browse/HDFS-395
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: blockReportPeriod.patch, explicitDeleteAcks.patch
> I have a cluster that has 1800 datanodes. Each datanode has around 50000 blocks and sends
a block report to the namenode once every hour. This means that the namenode processes a block
report once every 2 seconds. Each block report contains all blocks that the datanode currently
hosts. This makes the namenode compare a huge number of blocks that practically remains the
same between two consecutive reports. This wastes CPU on the namenode.
> The problem becomes worse when the number of datanodes increases.
> One proposal is to make succeeding block reports (after a successful send of a full block
report) be incremental. This will make the namenode process only those blocks that were added/deleted
in the last period.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message