hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vinay (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5496) Make replication queue initialization asynchronous
Date Wed, 11 Dec 2013 03:45:11 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5496?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13845046#comment-13845046

Vinay commented on HDFS-5496:

bq. For (4), looks like currently we only retrieve metrics information from postponedMisreplicatedBlocks
and we always check if the corresponding DNs are still stale before we make INVALIDATE decision.
Thus it should be safe if we delay its initialization. 
For this I am trying make some changes in the patch. Hope next patch will include this.
bq. For (2), currently we add under-replicated blocks into neededReplications when 1) initially
populating the replication queue, 2) checking replication when finalizing an under-construction
file, 3) checking replication progress for decommissioning DN, and 4) pending replicas timeout.
Delaying 1) and making it happen in parallel with 2)~4) should also be safe.
I guess this already in place. i.e. UnderReplicated Blocks are not added to neededReplications
in {{processMisReplicatedBlock(..)}}.
{code}    if (!block.isComplete()) {
      // Incomplete blocks are never considered mis-replicated --
      // they'll be reached when they are completed or recovered.
      return MisReplicationResult.UNDER_CONSTRUCTION;
bq. For the current patch, I understand we need a new iterator that can iterate the blocksMap
and not throw exception when concurrent modifications happen. However, I guess we may only
need to define a new iterator and do not need to define the new BlocksMapGSet here. Also,
since the new iterator shares most of the code with the existing LightWeightGSet#SetIterator,
maybe we can simply extend SetIterator here?
Yes. Sure. 
bq. So for case 3, in non-HA setup, I think maybe we do not need to restart the processing
since there should not be any pending editlog for NN to process in startActiveService? In
HA setup, since we can always run processMisReplicateBlocks in startActiveService, we actually
do not need to populate replication queue while still in safemode? If we're able to make these
two changes, for the current patch, we do not need to worry about some already-running replication
initializing thread.
This can be done. " do not need to worry about  already-running replication initializing "
means just return the call if already initialization is in progress?

> Make replication queue initialization asynchronous
> --------------------------------------------------
>                 Key: HDFS-5496
>                 URL: https://issues.apache.org/jira/browse/HDFS-5496
>             Project: Hadoop HDFS
>          Issue Type: Sub-task
>          Components: namenode
>            Reporter: Kihwal Lee
>         Attachments: HDFS-5496.patch
> Today, initialization of replication queues blocks safe mode exit and certain HA state
transitions. For a big name space, this can take hundreds of seconds with the FSNamesystem
write lock held.  During this time, important requests (e.g. initial block reports, heartbeat,
etc) are blocked.
> The effect of delaying the initialization would be not starting replication right away,
but I think the benefit outweighs. If we make it asynchronous, the work per iteration should
be limited, so that the lock duration is capped. 
> If full/incremental block reports and any other requests that modifies block state properly
performs replication checks while the blocks are scanned and the queues populated in background,
every block will be processed. (Some may be done twice)  The replication monitor should run
even before all blocks are processed.
> This will allow namenode to exit safe mode and start serving immediately even with a
big name space. It will also reduce the HA failover latency.

This message was sent by Atlassian JIRA

View raw message