hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Yi Liu (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7174) Support for more efficient large directories
Date Fri, 03 Oct 2014 05:06:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14157676#comment-14157676
] 

Yi Liu commented on HDFS-7174:
------------------------------

[~kihwal], I like the idea. It improve performance of add/remove for large directory, but
almost doesn't increase memory usage (has some increment when switching to use {{INodeHashedArrayList}},
explain as following) and search operation performance is not decreased.
* 1024 size list table needs 1024 * 4 = 4K size memory to store references (assume reference
is 4 bytes) of ArrayLists, and originally we only have 1 ArrayList, but now the number is
1024 ArrayLists, and in Java, Class has overhead, suppose it's 12 bytes, so total increased
Class overhead is ~1024*12 = 12K. In conclusion there are 16K memory increased for this FSDirectory.

Currently the children of INodeDirectory are stored as sorted array list. The time complexity:
Add: O(log n) + resize the list and array copy.
Delete: O(log n) + array copy.
Search: O(log n)
When the list gets bigger,  _resize list (allocate new memory) and array copy_ becomes performance
issue. 
Your idea is good to solve this issue.

{quote}
The biggest performance hit is due to the way insertion is done in ArrayList. An insertion
requires copying of average of 1/2 n entries. Also as the list gets bigger, a new one with
a bigger size needs to be allocated, copied and partially copied again. For big directories,
we are talking megabytes. Far better performance can be obtained by using different data structures,
but I believe ArrayList (originally primitive array) was chosen to minimize the memory usage.
{quote}
Agree

{quote}
I wonder, is it worthwhile to try to store the inodeid as an long[] directly?
As a reference, for c++ on modern machines inserting elements in std::vector is consistently
faster than std::list
{quote}
ArrayList is also efficient for memory, it stores array of references, just have a Class overhead
(Maybe 12 bytes?).
std::vector is similar with ArrayList in Java (Seems ArrayList and vector in java just have
difference in synchronization and  growth)

I have few comments:
*1.*  I see you choose the threshold 50000, and from the performance grid, {{ArrayList}} and
{{Auto Switching}} are close at this point.  I suggest we choose a bit bigger value, since
we need to consider the memory usage, as explained in #1
*2.* Following code
{code}
@Override
  public void add(int index, INode node) {
    getList().add(index, node);
    size++;
    checkSize();
  }
{code}
We can check the size firstly and then do add, it will be more efficient, same as delete
*3.* We also need to consider the behavior when reach the threshold. There may be add/remove
alternately, and cause switch the list frequently?  Maybe we can define separate thresholds
for add/remove, and remove threshold is a bit smaller.



> Support for more efficient large directories
> --------------------------------------------
>
>                 Key: HDFS-7174
>                 URL: https://issues.apache.org/jira/browse/HDFS-7174
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch
>
>
> When the number of children under a directory grows very large, insertion becomes very
costly.  E.g. creating 1M entries takes 10s of minutes.  This is because the complexity of
an insertion is O\(n\). As the size of a list grows, the overhead grows n^2. (integral of
linear function).  It also causes allocations and copies of big arrays.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message