hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-7174) Support for more efficient large directories
Date Fri, 03 Oct 2014 18:29:34 GMT

    [ https://issues.apache.org/jira/browse/HDFS-7174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14158305#comment-14158305

Konstantin Shvachko commented on HDFS-7174:

This is a long standing issue, which people tend to solve externally by creating artificial
hierarchies of directories similar to block directories on DNs. An internal solution would
be great to have. I am probably late to the party, but for whatever it worth.
Did you consider using balanced trees for inode lists, something like B-trees?
If the (btree) block size is large enough, it will work as an array (single node tree), while
if it grows it will split itself automatically.
Then for most directories it will retain current behviour, but one can also have a completely
flat namespace without worrying about long arrays.

> Support for more efficient large directories
> --------------------------------------------
>                 Key: HDFS-7174
>                 URL: https://issues.apache.org/jira/browse/HDFS-7174
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Kihwal Lee
>            Assignee: Kihwal Lee
>            Priority: Critical
>         Attachments: HDFS-7174.new.patch, HDFS-7174.patch, HDFS-7174.patch
> When the number of children under a directory grows very large, insertion becomes very
costly.  E.g. creating 1M entries takes 10s of minutes.  This is because the complexity of
an insertion is O\(n\). As the size of a list grows, the overhead grows n^2. (integral of
linear function).  It also causes allocations and copies of big arrays.

This message was sent by Atlassian JIRA

View raw message