hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5497) Performance may suffer when UnderReplicatedBlocks is used heavily
Date Mon, 11 Nov 2013 20:47:19 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819359#comment-13819359

Kihwal Lee commented on HDFS-5497:

I think it makes sense to increase the default minimum size to avoid repeated array growing
and shrinking with small number of under-replicated blocks.  In addition to this, I propose
making the minimum size configurable and also disabling the array growth/shrinkage an option
for users with big name space in their cluster.

> Performance may suffer when UnderReplicatedBlocks is used heavily
> -----------------------------------------------------------------
>                 Key: HDFS-5497
>                 URL: https://issues.apache.org/jira/browse/HDFS-5497
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Kihwal Lee
> Currently UnderReplicatedBlocks uses LightWeightLinkedSet with the default initial size
of 16.  If there are a lot of under-replicated blocks, insertion and removal can be very expensive.
> We see 450K to 1M under-replicated block during start-up, which typically go away soon
as last few data nodes join. With 450K under-replicated blocks, replication queue initialization
would re-allocate the underlying array 15 time and reinsert elements over 1M times.  As block
reports come in, it will go through the reverse.  I think this one of the reasons why initial
block reports after leaving safe mode can take very long time to process.
> With a larger initial/minimum size, the timing gets significantly shorter. 

This message was sent by Atlassian JIRA

View raw message