hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Matt Foley (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-3290) Use a better local directory layout for the datanode
Date Wed, 18 Apr 2012 18:26:40 GMT

    [ https://issues.apache.org/jira/browse/HDFS-3290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13256783#comment-13256783
] 

Matt Foley commented on HDFS-3290:
----------------------------------

Hi Colin,
I think you've misunderstood the block storage.  In each data sub-directory, it stores the
next 64 blocks (by default) and their metadata (128 files altogether), then spawns up to 64
new subdirectories and starts filling those.  Recurses as necessary.  Result is a directory
tree where each sub-directory has a max of 192 objects, and the leaf directories have 128
or less.

Please see org.apache.hadoop.hdfs.server.datanode.FSDataset.FSDir.addBlock() in the hadoop-1
branch.
                
> Use a better local directory layout for the datanode
> ----------------------------------------------------
>
>                 Key: HDFS-3290
>                 URL: https://issues.apache.org/jira/browse/HDFS-3290
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 0.23.0
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>
> When the HDFS DataNode stores chunks in a local directory, it currently puts all of the
chunk files into one big directory.  As the number of files increases, this does not work
well at all.  Local filesystems are not optimized for the case where there are hundreds of
thousands of files in the same directory.  It also makes inspecting directories with standard
UNIX tools difficult.
> Similar to the git version control system, HDFS should create a few different top level
directories keyed off of a few bits in the chunk ID.  Git uses 8 bits.  This substantially
cuts down on the number of chunk files in the same directory and gives increased performance.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message