hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7174) Support for more efficient large directories
Date Thu, 02 Oct 2014 15:01:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156643#comment-14156643

Kihwal Lee commented on HDFS-7174:

I manually run the test-patch and verified that no new javac warnings are generated with the
patch. I checked the diff of warnings.  In fact, the build log says this:
There appear to be 1264 javac compiler warnings before the patch and 1264 javac compiler warnings
after applying the patch
I am not sure why it decided to say there are two new warnings.

The failures in TestCommitBlockSynchronization is caused by the patch. Without the patch,
searching for an inode with {{null}} name in a directory always returns a valid index, 0,
even if there is no such entry. So, if the name of an inode is {{null}}, any {{INodeDirectory}}
instances will think it has this inode as a child!  The test case was relying on this broken
behavior. It doesn't even actually need to call {{parent.addChild(file)}}.

I will fix the test case.

> Support for more efficient large directories
> --------------------------------------------
>                 Key: HDFS-7174
>                 URL: https://issues.apache.org/jira/browse/HDFS-7174
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7174.patch, HDFS-7174.patch
> When the number of children under a directory grows very large, insertion becomes very
costly.  E.g. creating 1M entries takes 10s of minutes.  This is because the complexity of
an insertion is O\(n\). As the size of a list grows, the overhead grows n^2. (integral of
linear function).  It also causes allocations and copies of big arrays.

This message was sent by Atlassian JIRA

View raw message