hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-5497) Performance may suffer when UnderReplicatedBlocks is used heavily
Date Mon, 11 Nov 2013 20:43:18 GMT

    [ https://issues.apache.org/jira/browse/HDFS-5497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13819353#comment-13819353

Kihwal Lee commented on HDFS-5497:

I experimented with 125M blocks. The following is replication queue initialization times.
- No under-replicated blocks: 29 seconds
- Worst case - all under-replicated: 252 seconds

I made the initial size configurable and also disabled automatic growing of the array. I.e.
letting the hash collision chain grow.
- Worst case with 100M entries pre-allocated : 68 seconds
- Worst case with 10M entries pre-allocated : 77 seconds

I also measured time to reinsert all blocks in order to see how the look up time changes with
the longer collision chains.
- Reinsert with 100M entries pre-allocated, average chain length 1.2 : 32 seconds
- Reinsert with 10M entries pre-allocated, average chain length 12 : 33 seconds

> Performance may suffer when UnderReplicatedBlocks is used heavily
> -----------------------------------------------------------------
>                 Key: HDFS-5497
>                 URL: https://issues.apache.org/jira/browse/HDFS-5497
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: namenode
>            Reporter: Kihwal Lee
> Currently UnderReplicatedBlocks uses LightWeightLinkedSet with the default initial size
of 16.  If there are a lot of under-replicated blocks, insertion and removal can be very expensive.
> We see 450K to 1M under-replicated block during start-up, which typically go away soon
as last few data nodes join. With 450K under-replicated blocks, replication queue initialization
would re-allocate the underlying array 15 time and reinsert elements over 1M times.  As block
reports come in, it will go through the reverse.  I think this one of the reasons why initial
block reports after leaving safe mode can take very long time to process.
> With a larger initial/minimum size, the timing gets significantly shorter. 

This message was sent by Atlassian JIRA

View raw message