hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4075) Reduce recommissioning overhead
Date Thu, 18 Oct 2012 19:50:03 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479304#comment-13479304
] 

Kihwal Lee commented on HDFS-4075:
----------------------------------

We had a group of 40 nodes that were decommissioned then recommissioned. When they got recommissioned
by refreshing nodes using dfsadmin, there were over 5M over-replicated blocks, so holding
the write lock the NN (RPC handler) went through each of them and generated two log messages
per block.  That took about 5 minutes and over 2GB of log were written.  Because of the locking,
the namenode was unresponsive for the whole time.

I tested the commons-logging + log4j FileAppender family combination for its performance and
it was clear that the above case was hitting the logging bottleneck. When comparing logging
a single character vs. 400 bytes, time to finish logging 1,000,000 messages didn't seem much
different. It was not IO bound, but CPU bound as the CPU stayed 100% the whole time. Changing
FileAppender properties affected the timing a bit but not a lot.  It seems this is the inherent
limit of this logging mechanism.

For a single character logging, each message took 19-23us. Or it could do about 42K logs/sec
with CPU at 100%, almost no IO wait time.  We can see that the namenode in the case given
above were spending almost all of its time logging. The IO overhead was not significant.
                
> Reduce recommissioning overhead
> -------------------------------
>
>                 Key: HDFS-4075
>                 URL: https://issues.apache.org/jira/browse/HDFS-4075
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: name-node
>    Affects Versions: 0.23.4, 2.0.2-alpha
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>
> When datanodes are recommissioned, {BlockManager#processOverReplicatedBlocksOnReCommission()}
is called for each rejoined node and excess blocks are added to the invalidate list. The problem
is this is done while the namesystem write lock is held.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message