Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 25F5711F7E for ; Sat, 7 Jun 2014 00:35:03 +0000 (UTC) Received: (qmail 254 invoked by uid 500); 7 Jun 2014 00:35:02 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 205 invoked by uid 500); 7 Jun 2014 00:35:02 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 194 invoked by uid 99); 7 Jun 2014 00:35:02 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 07 Jun 2014 00:35:02 +0000 Date: Sat, 7 Jun 2014 00:35:02 +0000 (UTC) From: "James Thomas (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (HDFS-6482) Use block ID-based block layout on datanodes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-6482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] James Thomas updated HDFS-6482: ------------------------------- Attachment: HDFS-6482.2.patch Made all changes suggested by Colin. Some heap dumps I've taken with a single-machine cluster (with one DN) with anywhere from 100k to 250k blocks indicate that this change reduces DN memory consumption by something like 15-20% (due to the elimination of the subdirs array from ReplicaInfo and the LDir structure from BlockPoolSlice), discluding scanner memory consumption. Both the directory and block scanners were turned off in the test setup, since the scanners have transient memory usage that prevents easy comparison of memory usage between versions. > Use block ID-based block layout on datanodes > -------------------------------------------- > > Key: HDFS-6482 > URL: https://issues.apache.org/jira/browse/HDFS-6482 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 2.5.0 > Reporter: James Thomas > Assignee: James Thomas > Attachments: HDFS-6482.1.patch, HDFS-6482.2.patch, HDFS-6482.patch > > > Right now blocks are placed into directories that are split into many subdirectories when capacity is reached. Instead we can use a block's ID to determine the path it should go in. This eliminates the need for the LDir data structure that facilitates the splitting of directories when they reach capacity as well as fields in ReplicaInfo that keep track of a replica's location. > An extension of the work in HDFS-3290. -- This message was sent by Atlassian JIRA (v6.2#6252)