Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Sat, 7 Jun 2014 00:35:02 +0000 (UTC)
From: "James Thomas (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12718250.1401828568089.86118.1402101302810@arcas>
In-Reply-To: <JIRA.12718250.1401828568089@arcas>
References: <JIRA.12718250.1401828568089@arcas>
Subject: [jira] [Updated] (HDFS-6482) Use block ID-based block layout on
 datanodes
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


     [ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

James Thomas updated HDFS-6482:
-------------------------------

    Attachment: HDFS-6482.2.patch

Made all changes suggested by Colin. Some heap dumps I've taken with a single-machine cluster (with one DN) with anywhere from 100k to 250k blocks indicate that this change reduces DN memory consumption by something like 15-20% (due to the elimination of the subdirs array from ReplicaInfo and the LDir structure from BlockPoolSlice), discluding scanner memory consumption. Both the directory and block scanners were turned off in the test setup, since the scanners have transient memory usage that prevents easy comparison of memory usage between versions. 

> Use block ID-based block layout on datanodes
> --------------------------------------------
>
>                 Key: HDFS-6482
>                 URL: https://issues.apache.org/jira/browse/HDFS-6482
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 2.5.0
>            Reporter: James Thomas
>            Assignee: James Thomas
>         Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch
>
>
> Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location.
> An extension of the work in HDFS-3290.


--
This message was sent by Atlassian JIRA
(v6.2#6252)