hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelly Burkhart <kelly.burkh...@gmail.com>
Subject Re: Reduce java.lang.OutOfMemoryError
Date Wed, 16 Feb 2011 16:20:54 GMT
I should have mentioned this in my last email: I thought of that so I
logged into every machine in the cluster; each machine's
mapred-site.xml has the same md5sum.

On Wed, Feb 16, 2011 at 10:15 AM, James Seigel <james@tynt.com> wrote:
> He might not have that conf distributed out to each machine
>
>
> Sent from my mobile. Please excuse the typos.
>
> On 2011-02-16, at 9:10 AM, Kelly Burkhart <kelly.burkhart@gmail.com> wrote:
>
>> Our clust admin (who's out of town today) has mapred.child.java.opts
>> set to -Xmx1280 in mapred-site.xml.  However, if I go to the job
>> configuration page for a job I'm running right now, it claims this
>> option is set to -Xmx200m.  There are other settings in
>> mapred-site.xml that are different too.  Why would map/reduce jobs not
>> respect the mapred-site.xml file?
>>
>> -K
>>
>> On Wed, Feb 16, 2011 at 9:43 AM, Jim Falgout <jim.falgout@pervasive.com> wrote:
>>> You can set the amount of memory used by the reducer using the mapreduce.reduce.java.opts
property. Set it in mapred-site.xml or override it in your job. You can set it to something
like: -Xm512M to increase the amount of memory used by the JVM spawned for the reducer task.
>>>
>>> -----Original Message-----
>>> From: Kelly Burkhart [mailto:kelly.burkhart@gmail.com]
>>> Sent: Wednesday, February 16, 2011 9:12 AM
>>> To: common-user@hadoop.apache.org
>>> Subject: Re: Reduce java.lang.OutOfMemoryError
>>>
>>> I have had it fail with a single reducer and with 100 reducers.
>>> Ultimately it needs to be funneled to a single reducer though.
>>>
>>> -K
>>>
>>> On Wed, Feb 16, 2011 at 9:02 AM, real great..
>>> <greatness.hardness@gmail.com> wrote:
>>>> Hi,
>>>> How many reducers are you using currently?
>>>> Try increasing the number or reducers.
>>>> Let me know if it helps.
>>>>
>>>> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart <kelly.burkhart@gmail.com>wrote:
>>>>
>>>>> Hello, I'm seeing frequent fails in reduce jobs with errors similar
>>>>> to
>>>>> this:
>>>>>
>>>>>
>>>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>> header: attempt_201102081823_0175_m_002153_0, compressed len: 172492,
>>>>> decompressed len: 172488
>>>>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner:
>>>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf
>>>>> fleInMemory(ReduceTask.java:1508)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>> apOutput(ReduceTask.java:1408)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>> Output(ReduceTask.java:1261)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>> ReduceTask.java:1195)
>>>>>
>>>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>> Shuffling 172488 bytes (172492 raw bytes) into RAM from
>>>>> attempt_201102081823_0175_m_002153_0
>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>> header: attempt_201102081823_0175_m_002118_0, compressed len: 161944,
>>>>> decompressed len: 161940
>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>> header: attempt_201102081823_0175_m_001704_0, compressed len: 228365,
>>>>> decompressed len: 228361
>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>> Task
>>>>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from
>>>>> attempt_201102081823_0175_m_002153_0
>>>>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner:
>>>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure :
>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf
>>>>> fleInMemory(ReduceTask.java:1508)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>> apOutput(ReduceTask.java:1408)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>> Output(ReduceTask.java:1261)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>> ReduceTask.java:1195)
>>>>>
>>>>> Some also show this:
>>>>>
>>>>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>        at
>>>>> sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63)
>>>>>        at
>>>>> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811)
>>>>>        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>>>>>        at
>>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
>>>>> nection.java:1072)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getI
>>>>> nputStream(ReduceTask.java:1447)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>> apOutput(ReduceTask.java:1349)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>> Output(ReduceTask.java:1261)
>>>>>        at
>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>> ReduceTask.java:1195)
>>>>>
>>>>> The particular job I'm running is an attempt to merge multiple time
>>>>> series files into a single file.  The job tracker shows the following:
>>>>>
>>>>>
>>>>> Kind    Num Tasks    Complete   Killed    Failed/Killed Task Attempts
>>>>> map     15795        15795      0         0 / 29 reduce
 100
>>>>> 30         70        17 / 29
>>>>>
>>>>> All of the files I'm reading have records with a timestamp key similar
to:
>>>>>
>>>>> 2011-01-03 08:30:00.457000<tab><record>
>>>>>
>>>>> My map job is a simple python program that ignores rows with times <
>>>>> 08:30:00 and > 15:00:00, determines the type of input row and writes
>>>>> it to stdout with very minor modification.  It maintains no state and
>>>>> should not use any significant memory.  My reducer is the
>>>>> IdentityReducer.  The input files are individually gzipped then put
>>>>> into hdfs.  The total uncompressed size of the output should be
>>>>> around 150G.  Our cluster is 32 nodes each of which has 16G RAM and
>>>>> most of which have two 2T drives.  We're running hadoop 0.20.2.
>>>>>
>>>>>
>>>>> Can anyone provide some insight on how we can eliminate this issue?
>>>>> I'm certain this email does not provide enough info, please let me
>>>>> know what further information is needed to troubleshoot.
>>>>>
>>>>> Thanks in advance,
>>>>>
>>>>> -Kelly
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> R.V.
>>>>
>>>
>>>
>>>
>

Mime
View raw message