hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Colin Patrick McCabe (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
Date Fri, 01 Feb 2013 18:46:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13568959#comment-13568959

Colin Patrick McCabe commented on HDFS-4461:

If someone is running with around 200,000 blocks (a reasonable number), and a 50 to 80 character
path, this change saves between 50 and 100 MB of heap space during the DirectoryScanner run.
 That's what we should be focusing on here-- the efficiency improvement.  After all, that
is why I marked this JIRA as "improvement" rather than "bug" :)

bq. Or at least the number of ScanInfo objects you saw.

I saw more than 1 million {{ScanInfo}} objects.  This means that either the number of blocks
on the DN is much higher than we recommend, or there is another leak in the {{DirectoryScanner}}.
 I am trying to get confirmation that the number of blocks is really that high.  If it isn't,
then we will start looking more closely for memory leaks in the scanner.

We've found that the block scanner often delivers the finishing blow to DNs that are already
overloaded.  This makes sense-- if your heap is already near max size, asking you to allocate
a few hundred megabytes might finish you off.
> DirectoryScanner: volume path prefix takes up memory for every block that is scanned

> -------------------------------------------------------------------------------------
>                 Key: HDFS-4461
>                 URL: https://issues.apache.org/jira/browse/HDFS-4461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.3-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, memory-analysis.png
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  This object
contains two File objects-- one for the metadata file, and one for the block file.  Since
those File objects contain full paths, users who pick a lengthly path for their volume roots
will end up using an extra N_blocks * path_prefix bytes per block scanned.  We also don't
really need to store File objects-- storing strings and then creating File objects as needed
would be cheaper.  This has been causing out-of-memory conditions for users who pick such
long volume paths.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message