hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Amar Kamat (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-1374) Reduce memory footprint of FileSplit
Date Wed, 20 Jan 2010 14:25:54 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-1374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12802847#action_12802847
] 

Amar Kamat commented on MAPREDUCE-1374:
---------------------------------------

Few comments 
1) Since String.intern() takes up space in the PermGen area, JobClient should not get OOM
because of the low PermGen heap space of JobClient. What should we do about it? A current
client with low PermGen space and trying to submit a job with large input splits will fail
with this patch.
2) In the testcase, can you add a simple testcase to simply check FileSpit getters and also
FileSplit serialization? The reason is that one of the serialized parameters got changed.


> Reduce memory footprint of FileSplit
> ------------------------------------
>
>                 Key: MAPREDUCE-1374
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-1374
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>    Affects Versions: 0.20.1, 0.21.0, 0.22.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>             Fix For: 0.21.0, 0.22.0
>
>         Attachments: MAPREDUCE-1374.1.patch, MAPREDUCE-1374.2.patch, MAPREDUCE-1374.3.patch
>
>
> We can have many FileInput objects in the memory, depending on the number of mappers.
> It will save tons of memory on JobTracker and JobClient if we intern those Strings for
host names.
> {code}
> FileInputFormat.java:
>       for (NodeInfo host: hostList) {
>         // Strip out the port number from the host name
> -        retVal[index++] = host.node.getName().split(":")[0];
> +        retVal[index++] = host.node.getName().split(":")[0].intern();
>         if (index == replicationFactor) {
>           done = true;
>           break;
>         }
>       }
> {code}
> More on String.intern(): http://www.javaworld.com/javaworld/javaqa/2003-12/01-qa-1212-intern.html
> It will also save a lot of memory by changing the class of {{file}} from {{Path}} to
{{String}}. {{Path}} contains a {{java.net.URI}} which internally contains ~10 String fields.
This will also be a huge saving.
> {code}
>   private Path file;
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message