hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From James Seigel <ja...@tynt.com>
Subject Re: Reduce java.lang.OutOfMemoryError
Date Wed, 16 Feb 2011 15:16:02 GMT
Well the first thing I'd ask to see (if we can) is the code or a
description of what your reducer is doing.

If it is holding on to objects too long or accumulating lists well
then with the right amount of data you will run OOM.

Another thought is that you've just not allocated enough mem for the
reducer to run properly anyway. Try passing in a setting for the
reducer that ups the memory for it. 768 perhaps.

James

Sent from my mobile. Please excuse the typos.

On 2011-02-16, at 8:12 AM, Kelly Burkhart <kelly.burkhart@gmail.com> wrote:

> I have had it fail with a single reducer and with 100 reducers.
> Ultimately it needs to be funneled to a single reducer though.
>
> -K
>
> On Wed, Feb 16, 2011 at 9:02 AM, real great..
> <greatness.hardness@gmail.com> wrote:
>> Hi,
>> How many reducers are you using currently?
>> Try increasing the number or reducers.
>> Let me know if it helps.
>>
>> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart <kelly.burkhart@gmail.com>wrote:
>>
>>> Hello, I'm seeing frequent fails in reduce jobs with errors similar to
>>> this:
>>>
>>>
>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492,
>>> decompressed len: 172488
>>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner:
>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>>> java.lang.OutOfMemoryError: Java heap space
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>>>
>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>> Shuffling 172488 bytes (172492 raw bytes) into RAM from
>>> attempt_201102081823_0175_m_002153_0
>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944,
>>> decompressed len: 161940
>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365,
>>> decompressed len: 228361
>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask: Task
>>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from
>>> attempt_201102081823_0175_m_002153_0
>>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner:
>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>>> java.lang.OutOfMemoryError: Java heap space
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>>>
>>> Some also show this:
>>>
>>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>        at sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63)
>>>        at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811)
>>>        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>>>        at
>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1072)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getInputStream(ReduceTask.java:1447)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1349)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>        at
>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1195)
>>>
>>> The particular job I'm running is an attempt to merge multiple time
>>> series files into a single file.  The job tracker shows the following:
>>>
>>>
>>> Kind    Num Tasks    Complete   Killed    Failed/Killed Task Attempts
>>> map     15795        15795      0         0 / 29
>>> reduce  100          30         70        17 / 29
>>>
>>> All of the files I'm reading have records with a timestamp key similar to:
>>>
>>> 2011-01-03 08:30:00.457000<tab><record>
>>>
>>> My map job is a simple python program that ignores rows with times <
>>> 08:30:00 and > 15:00:00, determines the type of input row and writes
>>> it to stdout with very minor modification.  It maintains no state and
>>> should not use any significant memory.  My reducer is the
>>> IdentityReducer.  The input files are individually gzipped then put
>>> into hdfs.  The total uncompressed size of the output should be around
>>> 150G.  Our cluster is 32 nodes each of which has 16G RAM and most of
>>> which have two 2T drives.  We're running hadoop 0.20.2.
>>>
>>>
>>> Can anyone provide some insight on how we can eliminate this issue?
>>> I'm certain this email does not provide enough info, please let me
>>> know what further information is needed to troubleshoot.
>>>
>>> Thanks in advance,
>>>
>>> -Kelly
>>>
>>
>>
>>
>> --
>> Regards,
>> R.V.
>>

Mime
View raw message