hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Siddharth Seth (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-3057) Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB
Date Fri, 14 Oct 2011 20:54:11 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-3057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13127850#comment-13127850
] 

Siddharth Seth commented on MAPREDUCE-3057:
-------------------------------------------

Ran a couple of sleep jobs (usage from a heap dump):
1000m, 1r -> 15.9M (~1.6K per map)
1m, 1000r -> 20.1M (~2 K per reduce)

This cache is only populated when the job is actually accessed. Something like Oozie checking
for job completion status (single call so will be read from hdfs. the job would be cached
but may not be accessed again), or users accessing the UI/CLI (first call will be a read from
HDFS).
The more important cache is the jobListCache which is effectively the list of jobs available
on the UI. Default size is 20K (couple K per job).

We can't really make assumption about access patterns - and whether an average is good enough
to decide on the default size. A scripted fetch of all instances of a single job (1000r *
50) would cause an OOM on a 1G heap - which then affects all other users.
I'd prefer keeping the default really low, and let individual deployments adjust the value
via mapred-site.
                
> Job History Server goes of OutOfMemory with 1200 Jobs and Heap Size set to 10 GB
> --------------------------------------------------------------------------------
>
>                 Key: MAPREDUCE-3057
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-3057
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: jobhistoryserver, mrv2
>    Affects Versions: 0.23.0
>            Reporter: Karam Singh
>            Assignee: Eric Payne
>            Priority: Blocker
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-3057.v1.txt
>
>
> History server was started with -Xmx10000m
> Ran GridMix V3 with 1200 Jobs trace in STRESS mode on 350 nodes with each node 4 NMS.
> All jobs finished as reported by RM Web UI and HADOOP_MAPRED_HOME/bin/mapred job -list
all
> But found that GridMix job client was stuck while trying connect to HistoryServer
> Then tried to do HADOOP_MAPRED_HOME/bin/mapred job -status jobid
> JobClient also got stuck while looking for token to connect to History server
> Then looked at History Server logs and found History is trowing "java.lang.OutOfMemoryError:
GC overhead limit exceeded" error.
> With 10GB of Heap space and 1200 Jobs, History Server should not go out of memory .
> No matter what are the type of jobs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Mime
View raw message