hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andy Isaacson (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
Date Fri, 01 Feb 2013 19:48:17 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569013#comment-13569013

Andy Isaacson commented on HDFS-4461:

bq. A server generally has a lot of String objects. There are also file objects in ReplicasMap,
string paths tracked in many other places as well.

The cluster in question has about 1.5 million blocks per DN, across 12 datadirs.  This hprof
shows 1,858,340 BlockScanInfo objects. MAT computed the "Retained Heap" of FsDatasetImpl at
980 MB and the "Retained Heap" of the DirectoryScanner thread at 1.4 GB.

bq. ScanInfo is a short lived object, unlike other data structures that are long lived.

It doesn't matter how narrow the peak is, if it exceeds the maximum permissible value.  In
this case we seem to have a complete set of ScanInfo objects (for the entire dataset) active
on the heap, with the DirectoryScanner thread in the process of reconcile()ing them when it
> DirectoryScanner: volume path prefix takes up memory for every block that is scanned

> -------------------------------------------------------------------------------------
>                 Key: HDFS-4461
>                 URL: https://issues.apache.org/jira/browse/HDFS-4461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.3-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, memory-analysis.png
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  This object
contains two File objects-- one for the metadata file, and one for the block file.  Since
those File objects contain full paths, users who pick a lengthly path for their volume roots
will end up using an extra N_blocks * path_prefix bytes per block scanned.  We also don't
really need to store File objects-- storing strings and then creating File objects as needed
would be cheaper.  This would be a nice efficiency improvement.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message