hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4075) Reduce recommissioning overhead
Date Thu, 18 Oct 2012 19:50:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479304#comment-13479304

Kihwal Lee commented on HDFS-4075:

We had a group of 40 nodes that were decommissioned then recommissioned. When they got recommissioned
by refreshing nodes using dfsadmin, there were over 5M over-replicated blocks, so holding
the write lock the NN (RPC handler) went through each of them and generated two log messages
per block.  That took about 5 minutes and over 2GB of log were written.  Because of the locking,
the namenode was unresponsive for the whole time.

I tested the commons-logging + log4j FileAppender family combination for its performance and
it was clear that the above case was hitting the logging bottleneck. When comparing logging
a single character vs. 400 bytes, time to finish logging 1,000,000 messages didn't seem much
different. It was not IO bound, but CPU bound as the CPU stayed 100% the whole time. Changing
FileAppender properties affected the timing a bit but not a lot.  It seems this is the inherent
limit of this logging mechanism.

For a single character logging, each message took 19-23us. Or it could do about 42K logs/sec
with CPU at 100%, almost no IO wait time.  We can see that the namenode in the case given
above were spending almost all of its time logging. The IO overhead was not significant.
> Reduce recommissioning overhead
> -------------------------------
>                 Key: HDFS-4075
>                 URL: https://issues.apache.org/jira/browse/HDFS-4075
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.4, 2.0.2-alpha
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
> When datanodes are recommissioned, {BlockManager#processOverReplicatedBlocksOnReCommission()}
is called for each rejoined node and excess blocks are added to the invalidate list. The problem
is this is done while the namesystem write lock is held.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message