hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Tim Hawkins (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-4976) Mapper runs out of memory
Date Sun, 08 Mar 2009 19:27:56 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-4976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12680004#action_12680004
] 

Tim Hawkins commented on HADOOP-4976:
-------------------------------------

I have several problems wit running 0.19.0 on EC2 

Look very carefully at you out of memory error, it might not actually be out of memory, we
run on large EC21 instances, 5 mapreds per node with the following JVM config. 

-Xms2048m -Xmx2048m // preload the VM at 2G

-Xloggc:/mnt/logs/@taskid@.gc // enable VM logging

-XX:+UseConcMarkSweepGC // use concurrent garbage collection

-XX:-UseGCOverheadLimit // disable GC stall protection, otherwise processes with large memory
churn tend to get aborted

The last option turns off a protection added in java 6, which will produce an out of memory
exception if the GC takes too long to run, even if there is plenty of memory left, turning
it off seems to have increased stability dramatically

We tend to overcommit on the JVM heaps because our usage pattern means that only a few very
large tasks get run amongst a stream of smaller tasks. 

> Mapper runs out of memory
> -------------------------
>
>                 Key: HADOOP-4976
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4976
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.19.0
>         Environment: Amazon EC2 Extra Large instance (4 cores, 15 GB RAM), Sun Java 6
(1.6.0_10); 1 Master, 4 Slaves (all the same); each Java process takes the argument "-Xmx700m"
(2 Java processes per Instance)
>            Reporter: Richard J. Zak
>             Fix For: 0.19.2, 0.20.0
>
>
> The hadoop job has the task of processing 4 directories in HDFS, each with 15 files.
 This is sample data, a test run, before I go to the needed 5 directories of about 800 documents
each.  The mapper takes in nearly 200 pages (not files) and throws an OutOfMemory exception.
 The largest file is 17 MB.
> If this problem is something on my end and not truly a bug, I apologize.  However, after
Googling a bit, I did see many threads of people running out of memory with small data sets.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message