hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4075) Reduce recommissioning overhead
Date Thu, 18 Oct 2012 20:10:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479315#comment-13479315

Kihwal Lee commented on HDFS-4075:

On recommissioning, the dead nodes will not cause this overhead at that moment (i.e. not in
the same write lock block). They will do their own share of logging storm when they rejoin
and send in the full block reports, which would block the namenode for 6-7 seconds in the
above example. They will at least let others run in between such block reports. Or the nodes
can be brought up in a controlled manner to reduce the impact. E.g. two data node start-ups
per minute.

But the live nodes at the time of recommissioning can cause problems, unless processing of
potentially over-replicated blocks become asynchronous to recommissioning and also throttled.
Doing invalidation inline but pausing and releasing the lock won't be ideal since it will
prolong the duration of refreshNode command execution. Delaying this work using the mis-replicated
blocks handling can make it asynchronous, but it cannot be throttled; at the next block report,
all will be processed.

I think the simplest remedy is to disable the state change logging for block invalidation
during recommissioning. 

On a busy namenode, the overhead of logging every block state change may not be negligible.
We might want to add a capability to selectively disable certain class of state change logging.
(There are already places that disables logging for every block)

> Reduce recommissioning overhead
> -------------------------------
>                 Key: HDFS-4075
>                 URL: https://issues.apache.org/jira/browse/HDFS-4075
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.4, 2.0.2-alpha
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
> When datanodes are recommissioned, {BlockManager#processOverReplicatedBlocksOnReCommission()}
is called for each rejoined node and excess blocks are added to the invalidate list. The problem
is this is done while the namesystem write lock is held.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message