hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kelly Burkhart <kelly.burkh...@gmail.com>
Subject Re: Reduce java.lang.OutOfMemoryError
Date Wed, 16 Feb 2011 18:11:11 GMT
OK, the job was preferring the config file on my local machine which
is not part of the cluster over the cluster config files.  That seems
completely broken to me; my config was basically empty other than
containing the location of the cluster and my job apparently used
defaults rather than the cluster config.  It doesn't make sense to me
to keep configuration files synchronized on every machine that may
access the cluster.

I'm running again; we'll see if it completes this time.

-K

On Wed, Feb 16, 2011 at 10:30 AM, James Seigel <james@tynt.com> wrote:
> Hrmmm. Well as you've pointed out. 200m is quite small and is probably
> the cause.
>
> Now thEre might be some overriding settings in something you are using
> to launch or something.
>
> You could set those values in the config to not be overridden in the
> main conf then see what tries to override it in the logs
>
> Cheers
> James
>
> Sent from my mobile. Please excuse the typos.
>
> On 2011-02-16, at 9:21 AM, Kelly Burkhart <kelly.burkhart@gmail.com> wrote:
>
>> I should have mentioned this in my last email: I thought of that so I
>> logged into every machine in the cluster; each machine's
>> mapred-site.xml has the same md5sum.
>>
>> On Wed, Feb 16, 2011 at 10:15 AM, James Seigel <james@tynt.com> wrote:
>>> He might not have that conf distributed out to each machine
>>>
>>>
>>> Sent from my mobile. Please excuse the typos.
>>>
>>> On 2011-02-16, at 9:10 AM, Kelly Burkhart <kelly.burkhart@gmail.com> wrote:
>>>
>>>> Our clust admin (who's out of town today) has mapred.child.java.opts
>>>> set to -Xmx1280 in mapred-site.xml.  However, if I go to the job
>>>> configuration page for a job I'm running right now, it claims this
>>>> option is set to -Xmx200m.  There are other settings in
>>>> mapred-site.xml that are different too.  Why would map/reduce jobs not
>>>> respect the mapred-site.xml file?
>>>>
>>>> -K
>>>>
>>>> On Wed, Feb 16, 2011 at 9:43 AM, Jim Falgout <jim.falgout@pervasive.com>
wrote:
>>>>> You can set the amount of memory used by the reducer using the mapreduce.reduce.java.opts
property. Set it in mapred-site.xml or override it in your job. You can set it to something
like: -Xm512M to increase the amount of memory used by the JVM spawned for the reducer task.
>>>>>
>>>>> -----Original Message-----
>>>>> From: Kelly Burkhart [mailto:kelly.burkhart@gmail.com]
>>>>> Sent: Wednesday, February 16, 2011 9:12 AM
>>>>> To: common-user@hadoop.apache.org
>>>>> Subject: Re: Reduce java.lang.OutOfMemoryError
>>>>>
>>>>> I have had it fail with a single reducer and with 100 reducers.
>>>>> Ultimately it needs to be funneled to a single reducer though.
>>>>>
>>>>> -K
>>>>>
>>>>> On Wed, Feb 16, 2011 at 9:02 AM, real great..
>>>>> <greatness.hardness@gmail.com> wrote:
>>>>>> Hi,
>>>>>> How many reducers are you using currently?
>>>>>> Try increasing the number or reducers.
>>>>>> Let me know if it helps.
>>>>>>
>>>>>> On Wed, Feb 16, 2011 at 8:30 PM, Kelly Burkhart <kelly.burkhart@gmail.com>wrote:
>>>>>>
>>>>>>> Hello, I'm seeing frequent fails in reduce jobs with errors similar
>>>>>>> to
>>>>>>> this:
>>>>>>>
>>>>>>>
>>>>>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>> header: attempt_201102081823_0175_m_002153_0, compressed len:
172492,
>>>>>>> decompressed len: 172488
>>>>>>> 2011-02-15 15:21:10,163 FATAL org.apache.hadoop.mapred.TaskRunner:
>>>>>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure
:
>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf
>>>>>>> fleInMemory(ReduceTask.java:1508)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>>>> apOutput(ReduceTask.java:1408)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>>>> Output(ReduceTask.java:1261)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>>>> ReduceTask.java:1195)
>>>>>>>
>>>>>>> 2011-02-15 15:21:10,163 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>> Shuffling 172488 bytes (172492 raw bytes) into RAM from
>>>>>>> attempt_201102081823_0175_m_002153_0
>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>> header: attempt_201102081823_0175_m_002118_0, compressed len:
161944,
>>>>>>> decompressed len: 161940
>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>> header: attempt_201102081823_0175_m_001704_0, compressed len:
228365,
>>>>>>> decompressed len: 228361
>>>>>>> 2011-02-15 15:21:10,424 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>> Task
>>>>>>> attempt_201102081823_0175_r_000034_0: Failed fetch #1 from
>>>>>>> attempt_201102081823_0175_m_002153_0
>>>>>>> 2011-02-15 15:21:10,424 FATAL org.apache.hadoop.mapred.TaskRunner:
>>>>>>> attempt_201102081823_0175_r_000034_0 : Map output copy failure
:
>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuf
>>>>>>> fleInMemory(ReduceTask.java:1508)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>>>> apOutput(ReduceTask.java:1408)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>>>> Output(ReduceTask.java:1261)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>>>> ReduceTask.java:1195)
>>>>>>>
>>>>>>> Some also show this:
>>>>>>>
>>>>>>> Error: java.lang.OutOfMemoryError: GC overhead limit exceeded
>>>>>>>        at
>>>>>>> sun.net.www.http.ChunkedInputStream.(ChunkedInputStream.java:63)
>>>>>>>        at
>>>>>>> sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:811)
>>>>>>>        at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:632)
>>>>>>>        at
>>>>>>> sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLCon
>>>>>>> nection.java:1072)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getI
>>>>>>> nputStream(ReduceTask.java:1447)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getM
>>>>>>> apOutput(ReduceTask.java:1349)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copy
>>>>>>> Output(ReduceTask.java:1261)
>>>>>>>        at
>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(
>>>>>>> ReduceTask.java:1195)
>>>>>>>
>>>>>>> The particular job I'm running is an attempt to merge multiple
time
>>>>>>> series files into a single file.  The job tracker shows the
following:
>>>>>>>
>>>>>>>
>>>>>>> Kind    Num Tasks    Complete   Killed    Failed/Killed
Task Attempts
>>>>>>> map     15795        15795      0         0 / 29
reduce  100
>>>>>>> 30         70        17 / 29
>>>>>>>
>>>>>>> All of the files I'm reading have records with a timestamp key
similar to:
>>>>>>>
>>>>>>> 2011-01-03 08:30:00.457000<tab><record>
>>>>>>>
>>>>>>> My map job is a simple python program that ignores rows with
times <
>>>>>>> 08:30:00 and > 15:00:00, determines the type of input row
and writes
>>>>>>> it to stdout with very minor modification.  It maintains no
state and
>>>>>>> should not use any significant memory.  My reducer is the
>>>>>>> IdentityReducer.  The input files are individually gzipped then
put
>>>>>>> into hdfs.  The total uncompressed size of the output should
be
>>>>>>> around 150G.  Our cluster is 32 nodes each of which has 16G
RAM and
>>>>>>> most of which have two 2T drives.  We're running hadoop 0.20.2.
>>>>>>>
>>>>>>>
>>>>>>> Can anyone provide some insight on how we can eliminate this
issue?
>>>>>>> I'm certain this email does not provide enough info, please let
me
>>>>>>> know what further information is needed to troubleshoot.
>>>>>>>
>>>>>>> Thanks in advance,
>>>>>>>
>>>>>>> -Kelly
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Regards,
>>>>>> R.V.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>
>

Mime
View raw message