hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "dhruba borthakur (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HDFS-1391) Exiting safemode takes a long time when there are lots of blocks in the HDFS
Date Sun, 19 Sep 2010 08:38:34 GMT

     [ https://issues.apache.org/jira/browse/HDFS-1391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

dhruba borthakur updated HDFS-1391:
-----------------------------------

    Attachment: excessReplicas.1_trunk.txt

This patch reorganizes the code so that we can invoke chooseExcessReplicas without the FsNamesystem
lock.

at the time of exiting safemode, we walk through all the blocks and for those blocks that
have excess replicas we insert them into the overReplicatedBlocks data structure. Thus, exiting
safemode is fast, thereby reducing the time to restart the namenode.

The ReplicationMonitor asynchronously processes the blocks in the overReplicatedBlocks data
structure.

> Exiting safemode takes a long time when there are lots of blocks in the HDFS
> ----------------------------------------------------------------------------
>
>                 Key: HDFS-1391
>                 URL: https://issues.apache.org/jira/browse/HDFS-1391
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: excessReplicas.1_trunk.txt
>
>
> When the namenode decides to exit safemode,  it acquires the FSNamesystem lock and then
iterates over all blocks in the blocksmap to determine if any block has any excess replicas.
This call takes upwards of 5 minutes on a cluster that has 100 million blocks. This delays
namenode restart to a good extent.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message