hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "James Thomas (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (HDFS-6482) Use block ID-based block layout on datanodes
Date Sat, 07 Jun 2014 00:35:02 GMT

     [ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

James Thomas updated HDFS-6482:

    Attachment: HDFS-6482.2.patch

Made all changes suggested by Colin. Some heap dumps I've taken with a single-machine cluster
(with one DN) with anywhere from 100k to 250k blocks indicate that this change reduces DN
memory consumption by something like 15-20% (due to the elimination of the subdirs array from
ReplicaInfo and the LDir structure from BlockPoolSlice), discluding scanner memory consumption.
Both the directory and block scanners were turned off in the test setup, since the scanners
have transient memory usage that prevents easy comparison of memory usage between versions.

> Use block ID-based block layout on datanodes
> --------------------------------------------
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch
> Right now blocks are placed into directories that are split into many subdirectories
when capacity is reached. Instead we can use a block's ID to determine the path it should
go in. This eliminates the need for the LDir data structure that facilitates the splitting
of directories when they reach capacity as well as fields in ReplicaInfo that keep track of
a replica's location.
> An extension of the work in HDFS-3290.

This message was sent by Atlassian JIRA

View raw message