hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kihwal Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-6482) Use block ID-based block layout on datanodes
Date Mon, 09 Jun 2014 21:23:02 GMT

    [ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025761#comment-14025761
] 

Kihwal Lee commented on HDFS-6482:
----------------------------------

BlockIDs are sequential nowadays. With the proposed block distribution method,  leaf dirs
can get severely unbalanced, especially in smaller clusters.  Besides the cost of looking
up entries in a directory, directory lock contention can become high and hurt performance
if many files are created and read from a small set of directories. I think limiting the number
to 64 kind of imposed a cap on how contentious it can be.  We might do better by more evenly
distributing blocks. 

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many subdirectories
when capacity is reached. Instead we can use a block's ID to determine the path it should
go in. This eliminates the need for the LDir data structure that facilitates the splitting
of directories when they reach capacity as well as fields in ReplicaInfo that keep track of
a replica's location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message