hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robert Joseph Evans (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3057) Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB
Date Fri, 14 Oct 2011 14:24:12 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127569#comment-13127569

Robert Joseph Evans commented on MAPREDUCE-3057:

If we get the value too small then the server runs slow.  If we get it too big the server
does not run at all.  I would rather err on the side of caution, but with some number to back
up my decision.

>From my previous back of the envelope calculation:
||Number of Jobs||Heap Needed||Heap Rounded Up to a Reasonable Value||

100 jobs would require around 850MB of heap maybe 1GB to be safe.  Is that the default value
that we are going to use to launch the history server with?  It easily falls in the range
of 32-bits, so if we make sure that the default max heap and the default job cache are in
line with one another it seems good to me.  But, if the default is 2GB then lets go with a
default of 200.  It would also be good to update the documentation and value in mapred-default.xml
to indicate that this ties directly to the heap needed.

+1 for 100 (non-binding)
> Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB
> --------------------------------------------------------------------------------
>                 Key: MAPREDUCE-3057
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3057
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Karam Singh
>            Assignee: Eric Payne
>            Priority: Blocker
>             Fix For: 0.23.0
>         Attachments: MAPREDUCE-3057.v1.txt
> History server was started with -Xmx10000m
> Ran GridMix V3 with 1200 Jobs trace in STRESS mode on 350 nodes with each node 4 NMS.
> All jobs finished as reported by RM Web UI and HADOOP_MAPRED_HOME/bin/mapred job -list
> But found that GridMix job client was stuck while trying connect to HistoryServer
> Then tried to do HADOOP_MAPRED_HOME/bin/mapred job -status jobid
> JobClient also got stuck while looking for token to connect to History server
> Then looked at History Server logs and found History is trowing "java.lang.OutOfMemoryError:
GC overhead limit exceeded" error.
> With 10GB of Heap space and 1200 Jobs, History Server should not go out of memory .
> No matter what are the type of jobs.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message