Return-Path: X-Original-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Delivered-To: apmail-hadoop-hdfs-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id A2721E7A5 for ; Fri, 1 Feb 2013 19:48:17 +0000 (UTC) Received: (qmail 64456 invoked by uid 500); 1 Feb 2013 19:48:17 -0000 Delivered-To: apmail-hadoop-hdfs-issues-archive@hadoop.apache.org Received: (qmail 64418 invoked by uid 500); 1 Feb 2013 19:48:17 -0000 Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hdfs-issues@hadoop.apache.org Delivered-To: mailing list hdfs-issues@hadoop.apache.org Received: (qmail 64408 invoked by uid 99); 1 Feb 2013 19:48:17 -0000 Received: from arcas.apache.org (HELO arcas.apache.org) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Feb 2013 19:48:17 +0000 Date: Fri, 1 Feb 2013 19:48:17 +0000 (UTC) From: "Andy Isaacson (JIRA)" To: hdfs-issues@hadoop.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569013#comment-13569013 ] Andy Isaacson commented on HDFS-4461: ------------------------------------- bq. A server generally has a lot of String objects. There are also file objects in ReplicasMap, string paths tracked in many other places as well. The cluster in question has about 1.5 million blocks per DN, across 12 datadirs. This hprof shows 1,858,340 BlockScanInfo objects. MAT computed the "Retained Heap" of FsDatasetImpl at 980 MB and the "Retained Heap" of the DirectoryScanner thread at 1.4 GB. bq. ScanInfo is a short lived object, unlike other data structures that are long lived. It doesn't matter how narrow the peak is, if it exceeds the maximum permissible value. In this case we seem to have a complete set of ScanInfo objects (for the entire dataset) active on the heap, with the DirectoryScanner thread in the process of reconcile()ing them when it OOMs. > DirectoryScanner: volume path prefix takes up memory for every block that is scanned > ------------------------------------------------------------------------------------- > > Key: HDFS-4461 > URL: https://issues.apache.org/jira/browse/HDFS-4461 > Project: Hadoop HDFS > Issue Type: Improvement > Affects Versions: 2.0.3-alpha > Reporter: Colin Patrick McCabe > Assignee: Colin Patrick McCabe > Priority: Minor > Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, memory-analysis.png > > > In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block. This object contains two File objects-- one for the metadata file, and one for the block file. Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned. We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper. This would be a nice efficiency improvement. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira