hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Suresh Srinivas (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HDFS-4461) DirectoryScanner: volume path prefix takes up memory for every block that is scanned
Date Fri, 01 Feb 2013 22:54:13 GMT

    [ https://issues.apache.org/jira/browse/HDFS-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13569198#comment-13569198
] 

Suresh Srinivas commented on HDFS-4461:
---------------------------------------

I think my earlier comments perhaps are not clear. Let me give it another try :)

+1 for optimizing the data structures in datanode.

bq. Suresh – we routinely see users with millions of replicas per DN now that 48TB+ configurations
have become commodity. Sure, we should also encourage users to use things like HAR to coalesce
into larger blocks, but easy wins on DN memory usage are a no-brainer IMO.
This is again not the point I am making either. I know and understand that number of blocks
in DN is growing. Data structures in datanode need to be optimized. At the same time, as the
DNs support more storage, the DN heap also needs to be suitably increased.

What my previous comments are related to the assertion that DirectoryScanner is causing OOM.
OOM is not caused by the scanner. It is caused by incorrectly sizing the datanode JVM heap,
unless one shows a leak in DirectoryScanner. So the comment was to edit the description to
reflect it.

We need to also optimize the long lived data structures in datanode. I thought one would start
with that instead of DirectoryScanner, which creates short lived objects. Create HDFS-4465
to track that.
                
> DirectoryScanner: volume path prefix takes up memory for every block that is scanned

> -------------------------------------------------------------------------------------
>
>                 Key: HDFS-4461
>                 URL: https://issues.apache.org/jira/browse/HDFS-4461
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>    Affects Versions: 2.0.3-alpha
>            Reporter: Colin Patrick McCabe
>            Assignee: Colin Patrick McCabe
>            Priority: Minor
>         Attachments: HDFS-4461.002.patch, HDFS-4461.003.patch, memory-analysis.png
>
>
> In the {{DirectoryScanner}}, we create a class {{ScanInfo}} for every block.  This object
contains two File objects-- one for the metadata file, and one for the block file.  Since
those File objects contain full paths, users who pick a lengthly path for their volume roots
will end up using an extra N_blocks * path_prefix bytes per block scanned.  We also don't
really need to store File objects-- storing strings and then creating File objects as needed
would be cheaper.  This would be a nice efficiency improvement.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message