hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jing Zhao (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
Date Tue, 10 Dec 2013 20:06:08 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13844607#comment-13844607

Jing Zhao commented on HDFS-5496:

Hi [~vinayrpet], your proposed solution looks good to me. Currently the processMisReplicatedBlock
method updates 4 different data structures:(1) the invalidateBlocks storing blocks that does
not belong to any file, and (2) neededReplications storing blocks that need to be replicated,
and (3) excessReplicateMap tracking over replicated blocks (and these blocks are added into
invalidateBlocks too), and (4) postponedMisreplicatedBlocks storing blocks that seem like
over-replicated but we still need to wait for the deletion report from the corresponding DNs.

For (4), looks like currently we only retrieve metrics information from postponedMisreplicatedBlocks
and we always check if the corresponding DNs are still stale before we make INVALIDATE decision.
Thus it should be safe if we delay its initialization. For (2), currently we add under-replicated
blocks into neededReplications when 1) initially populating the replication queue, 2) checking
replication when finalizing an under-construction file, 3) checking replication progress for
decommissioning DN, and 4) pending replicas timeout. Delaying 1) and making it happen in parallel
with 2)~4) should also be safe.

For the current patch, I understand we need a new iterator that can iterate the blocksMap
and not throw exception when concurrent modifications happen. However, I guess we may only
need to define a new iterator and do not need to define the new BlocksMapGSet here. Also,
since the new iterator shares most of the code with the existing LightWeightGSet#SetIterator,
maybe we can simply extend SetIterator here?

> Make replication queue initialization asynchronous
> --------------------------------------------------
>                 Key: HDFS-5496
>                 URL: https://issues.apache.org/jira/browse/HDFS-5496
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Kihwal Lee
>         Attachments: HDFS-5496.patch
> Today, initialization of replication queues blocks safe mode exit and certain HA state
transitions. For a big name space, this can take hundreds of seconds with the FSNamesystem
write lock held.  During this time, important requests (e.g. initial block reports, heartbeat,
etc) are blocked.
> The effect of delaying the initialization would be not starting replication right away,
but I think the benefit outweighs. If we make it asynchronous, the work per iteration should
be limited, so that the lock duration is capped. 
> If full/incremental block reports and any other requests that modifies block state properly
performs replication checks while the blocks are scanned and the queues populated in background,
every block will be processed. (Some may be done twice)  The replication monitor should run
even before all blocks are processed.
> This will allow namenode to exit safe mode and start serving immediately even with a
big name space. It will also reduce the HA failover latency.

This message was sent by Atlassian JIRA

View raw message