hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "amith (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3512) Delay in scanning blocks at DN side when there are huge number of blocks
Date Wed, 06 Jun 2012 05:10:23 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13289954#comment-13289954

amith commented on HDFS-3512:

This problem can occur in the following scenario 

when a new block is added to DN addBlock() in BPSC will generate a lastScanTime as
private synchronized long getNewBlockScanTime() {
    /* If there are a lot of blocks, this returns a random time with in
     * the scan period. Otherwise something sooner.
    long period = Math.min(scanPeriod,
                           Math.max(blockMap.size(),1) * 600 * 1000L);
    int periodInt = Math.abs((int)period);
    return System.currentTimeMillis() - scanPeriod +

Now when the block is being added to blockInfoSet (TreeSet) which will compare all the blocks
already present in the blockInfoSet and add the blocks.

All the unscanned blocks will be added to the head of the blockInfoSet after scanning it will
be moved to tail end of blockInfoSet.

When the scan() is scanning for the blocks, it will scan for all the blocks from the head
to tail till it gets any already scanned block in the previous iteration from isFirstBlockProcessed().

We are adding a random number to initial scan time if the random number manages the scantime
greater than that of other blocks then this block will be added to the tail end after the
scanned blocks which will cause the blocks to starve for block scanning.

> Delay in scanning blocks at DN side when there are huge number of blocks
> ------------------------------------------------------------------------
>                 Key: HDFS-3512
>                 URL: https://issues.apache.org/jira/browse/HDFS-3512
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: data-node
>    Affects Versions: 2.0.1-alpha
>            Reporter: suja s
>            Assignee: amith
> Block scanner maintains the full list of blocks at DN side in a map and there is no differentiation
between the blocks which are already scanned and the ones not scanend. For every check (ie
every 5 secs) it will pick one block and scan. There are chances that it chooses a block which
is already scanned which leads to further delay in scanning of blcoks which are yet to be

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message