hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelly Burkhart <kelly.burkh...@gmail.com>
Subject Reduce java.lang.OutOfMemoryError
Date Wed, 16 Feb 2011 15:00:07 GMT
Hello, I'm seeing frequent fails in reduce jobs with errors similar to this:


2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
header: attempt_201102081823_0175_m_002153_0, compressed len: 172492,
decompressed len: 172488
2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner:
attempt_201102081823_0175_r_000034_0 : Map output copy failure :
java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)

2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
Shuffling 172488 bytes (172492 raw bytes) into RAM from
attempt_201102081823_0175_m_002153_0
2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
header: attempt_201102081823_0175_m_002118_0, compressed len: 161944,
decompressed len: 161940
2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
header: attempt_201102081823_0175_m_001704_0, compressed len: 228365,
decompressed len: 228361
2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: Task
attempt_201102081823_0175_r_000034_0: Failed fetch #1 from
attempt_201102081823_0175_m_002153_0
2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner:
attempt_201102081823_0175_r_000034_0 : Map output copy failure :
java.lang.OutOfMemoryError: Java heap space
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)

Some also show this:

Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
	at sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63)
	at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811)
	at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
	at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
	at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)

The particular job I'm running is an attempt to merge multiple time
series files into a single file.  The job tracker shows the following:


Kind    Num Tasks    Complete   Killed    Failed/Killed Task Attempts
map     15795        15795      0         0 / 29
reduce  100          30         70        17 / 29

All of the files I'm reading have records with a timestamp key similar to:

2011-01-03 08:30:00.457000<tab><record>

My map job is a simple python program that ignores rows with times <
08:30:00 and > 15:00:00, determines the type of input row and writes
it to stdout with very minor modification.  It maintains no state and
should not use any significant memory.  My reducer is the
IdentityReducer.  The input files are individually gzipped then put
into hdfs.  The total uncompressed size of the output should be around
150G.  Our cluster is 32 nodes each of which has 16G RAM and most of
which have two 2T drives.  We're running hadoop 0.20.2.


Can anyone provide some insight on how we can eliminate this issue?
I'm certain this email does not provide enough info, please let me
know what further information is needed to troubleshoot.

Thanks in advance,

-Kelly

Mime
View raw message