Mailing-List: contact hdfs-issues-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: hdfs-issues@hadoop.apache.org
Date: Fri, 1 Feb 2013 19:48:17 +0000 (UTC)
From: "Andy Isaacson (JIRA)" <jira@apache.org>
To: hdfs-issues@hadoop.apache.org
Message-ID: <JIRA.12630261.1359679603206.231129.1359748097217@arcas>
In-Reply-To: <JIRA.12630261.1359679603206@arcas>
References: <JIRA.12630261.1359679603206@arcas>
Subject: [jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix
 takes up memory for every block that is scanned
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit


    [ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569013#comment-13569013 ] 

Andy Isaacson commented on HDFS-4461:
-------------------------------------

bq. A server generally has a lot of String objects. There are also file objects in ReplicasMap, string paths tracked in many other places as well.

The cluster in question has about 1.5 million blocks per DN, across 12 datadirs.  This hprof shows 1,858,340 BlockScanInfo objects. MAT computed the "Retained Heap" of FsDatasetImpl at 980 MB and the "Retained Heap" of the DirectoryScanner thread at 1.4 GB.

bq. ScanInfo is a short lived object, unlike other data structures that are long lived.

It doesn't matter how narrow the peak is, if it exceeds the maximum permissible value.  In this case we seem to have a complete set of ScanInfo objects (for the entire dataset) active on the heap, with the DirectoryScanner thread in the process of reconcile()ing them when it OOMs.
                
> DirectoryScanner: volume path prefix takes up memory for every block that is scanned 
> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-4461
>                 URL: https://issues.apache.org/jira/browse/HDFS-4461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.3-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  This object contains two File objects-- one for the metadata file, and one for the block file.  Since those File objects contain full paths, users who pick a lengthly path for their volume roots will end up using an extra N_blocks * path_prefix bytes per block scanned.  We also don't really need to store File objects-- storing strings and then creating File objects as needed would be cheaper.  This would be a nice efficiency improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira