hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-4229) Intern counter names in the JT
Date Thu, 18 Oct 2012 22:14:03 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-4229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13479420#comment-13479420

Robert Joseph Evans commented on MAPREDUCE-4229:

I rand some benchmarks looking at the Job History server using a jhist file for a job that
had 9416 maps and 500 reducers.  I then used a combination of YourKit and jhat to look at
the heap savings.

For Jhat I did the OQL {noformat}select sum(map(heap.objects("java.lang.String"),"sizeof(it)")){noformat}
to get the size of all of the strings currently reachable on the heap.

I saw that nothing changed in between the base and the first patch.  Both of them had 22MB
of strings in the heap.  Looking at the code that was changed to do interning, the only code
that uses it was rumen.  It is still a good change, but it did not have the impact I was looking
for.  So I implemented the patch I just attached which adds in interning of Strings that are
parsed out of the jhist file.  This reduced the 22MB of strings to 3MB of strings.

I want to do something similar for the AM, but it is more difficult to look at, and I don't
think I will have time in the near future. So if someone else could review this we can check
it in and file a follow up JIRA for looking at the AM. 
> Intern counter names in the JT
> ------------------------------
>                 Key: MAPREDUCE-4229
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-4229
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 1.0.2, 3.0.0, 2.0.2-alpha
>            Reporter: Todd Lipcon
>         Attachments: MAPREDUCE-4229-branch-0.23.patch, MAPREDUCE-4229.patch, MR-4229.txt
> In our experience, most of the memory in production JTs goes to storing counter names
(String objects and character arrays). Since most counter names are reused again and again,
it would be a big memory savings to keep a hash set of already-used counter names within a
job, and refer to the same object from all tasks.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message