hadoop-hdfs-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Konstantin Shvachko (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HDFS-1110) Namenode heap optimization - reuse objects for commonly used file names
Date Thu, 29 Apr 2010 21:42:55 GMT

    [ https://issues.apache.org/jira/browse/HDFS-1110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12862431#action_12862431
] 

Konstantin Shvachko commented on HDFS-1110:
-------------------------------------------

bq. File names used > 100000 times 	24

What are the names of these 24 files? Do they fall under the proposed default pattern. How
big is the noise if we use the default pattern.

On the one hand I see the point of providing a generic approach for people to specify their
own patterns.
But I also agree with Dhruba that we need to optimize only for the top ten (or so) file names,
which will give us 5% saving in the meta-data memory footprint. The rest should be ignored,
it would be a wast of resources to optimize for the rest. Your approach 2 would be a move
in this direction.

So may be it would be useful to have a tool Jacob mentions (OIV-based), so that admins could
run it offline on the image and get top N frequently used names, with an estimate how much
space this saves. Then they will be able to formulate the reg exp. Otherwise, it is going
to be a painful guessing game.

> Namenode heap optimization - reuse objects for commonly used file names
> -----------------------------------------------------------------------
>
>                 Key: HDFS-1110
>                 URL: https://issues.apache.org/jira/browse/HDFS-1110
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>            Reporter: Suresh Srinivas
>            Assignee: Suresh Srinivas
>             Fix For: 0.22.0
>
>         Attachments: hdfs-1110.patch
>
>
> There are a lot of common file names used in HDFS, mainly created by mapreduce, such
as file names starting with "part". Reusing byte[] corresponding to these recurring file names
will save significant heap space used for storing the file names in millions of INodeFile
objects.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message