Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 8269EE2D3 for ; Fri, 1 Feb 2013 17:34:13 +0000 (UTC) Received: (qmail 85017 invoked by uid 500); 1 Feb 2013 17:34:13 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 84883 invoked by uid 500); 1 Feb 2013 17:34:12 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 84694 invoked by uid 99); 1 Feb 2013 17:34:12 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Feb 2013 17:34:12 +0000 Date: Fri, 1 Feb 2013 17:34:12 +0000 (UTC) From: "Suresh Srinivas (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568901#comment-13568901 ] Suresh Srinivas commented on HDFS-4461: --------------------------------------- bq. This has been causing out-of-memory conditions for users who pick such long volume paths. I doubt that the directory scanner is the cause of OOM error. It is probably happening due to some other issue. How many blocks per storage directory do you have, when OOME happened? bq. here's a before vs. after picture of a memory analysis. you can see that in the "after" picture, we are no longer storing the path prefix twice per block in the ScanInfo class I have hard time understanding the picture. How many bytes are we saving per ScanInfo? > DirectoryScanner: volume path prefix takes up memory for every block that is scanned > ------------------------------------------------------------------------------------- > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.0.3-alpha > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This has been causing out-of-memory conditions for users who pick such long volume paths. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira