hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Thomas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6482) Use block ID-based block layout on datanodes
Date Tue, 10 Jun 2014 17:59:02 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

James Thomas updated HDFS-6482:
-------------------------------

    Attachment: HDFS-6482.3.patch

Made changes suggested by Arpit. I don't think that deletion of empty directories is necessary
-- it was not done in the previous scheme and the benefit in terms of faster directory listings
and lookups seems marginal (and there is some chance that the directory will be recreated
at a later time). I have added a third subdir level (with the 25th to 32nd bits of the block
ID) to further reduce the likelihood of directory blowup in large clusters. For a cluster
with N blocks (to clarify, this means that N blocks have been created over the lifetime of
the cluster, but some may have been deleted), the upper bound on the number of files in any
DN directory is now N/2^24, so even for clusters with 2^30 (~1 billion) blocks created over
their lifetimes we should have fairly small directories. I don't think there's any need to
implement further logic to prevent a directory from exceeding 256 entries, since this can't
happen anyway with clusters with fewer than 2^32 blocks created, and even then the probability
is very small.

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.3.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many subdirectories
when capacity is reached. Instead we can use a block's ID to determine the path it should
go in. This eliminates the need for the LDir data structure that facilitates the splitting
of directories when they reach capacity as well as fields in ReplicaInfo that keep track of
a replica's location.
> An extension of the work in HDFS-3290.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Mime
View raw message