hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Allen Wittenauer (JIRA)" <j...@apache.org>
Subject [jira] Commented: (MAPREDUCE-2345) Optimize jobtracker's memory usage
Date Fri, 18 Mar 2011 19:40:29 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13008597#comment-13008597
] 

Allen Wittenauer commented on MAPREDUCE-2345:
---------------------------------------------

> But how about a running job with tens of thousands of tasks? We see that big running

> jobs use much memory in the cluster. 

This is almost always a sign that either the data being read is not laid out efficiently/too
small of block size, that one needs to use CombinedFileInputFormat, or there just too many
reducers in play.  There is almost never a reason to have jobs in the x0,000 area unless the
dataset is Just That Big.

> Optimize jobtracker's  memory usage  
> -------------------------------------
>
>                 Key: MAPREDUCE-2345
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2345
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: jobtracker
>    Affects Versions: 0.21.0
>            Reporter: MengWang
>              Labels: hadoop
>             Fix For: 0.23.0
>
>         Attachments: jt-memory-useage.bmp
>
>
> Too many tasks will eat up a considerable amount of JobTracker's heap space. According
to our observation, 50GB heap size can support to 5,000,000 tasks, so we should optimize jobtracker's
memory usage for more jobs and tasks. Yourkit java profile show that counters, duplicate strings,
task waste too much memory. Our optimization around these three points reduced jobtracker's
memory to 1/3. 

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message