hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ted Yu <yuzhih...@gmail.com>
Subject Re: Shuffle In Memory OutOfMemoryError
Date Thu, 11 Mar 2010 03:48:07 GMT
Thanks to Andy for the log he provided.

You can see from the log below that size increased steadily from 341535057
to 408181692, approaching maxSize. Then OOME:

2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
pos=start requestedSize=3893000 size=341535057 numPendingRequests=0
maxSize=417601952
2010-03-10 18:38:32,936 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
pos=end requestedSize=3893000 size=345428057 numPendingRequests=0
maxSize=417601952
...
2010-03-10 18:38:35,950 INFO org.apache.hadoop.mapred.ReduceTask: reserve:
pos=end requestedSize=635753 size=408181692 numPendingRequests=0
maxSize=417601952
2010-03-10 18:38:36,603 INFO org.apache.hadoop.mapred.ReduceTask: Task
attempt_201003101826_0001_r_000004_0: Failed fetch #1 from
attempt_201003101826_0001_m_000875_0

2010-03-10 18:38:36,603 WARN org.apache.hadoop.mapred.ReduceTask:
attempt_201003101826_0001_r_000004_0 adding host hd17.dfs.returnpath.net to
penalty box, next contact in 4 seconds
2010-03-10 18:38:36,604 INFO org.apache.hadoop.mapred.ReduceTask:
attempt_201003101826_0001_r_000004_0: Got 1 map-outputs from previous
failures
2010-03-10 18:38:36,605 FATAL org.apache.hadoop.mapred.TaskRunner:
attempt_201003101826_0001_r_000004_0 : Map output copy failure :
java.lang.OutOfMemoryError: Java heap space
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMemory(ReduceTask.java:1513)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutput(ReduceTask.java:1413)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(ReduceTask.java:1266)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceTask.java:1200)

Looking at the call to unreserve() in ReduceTask, two were for IOException
and the other was for Sanity check (line 1557). Meaning they wouldn't be
called in normal execution path.

I see one call in IFile.InMemoryReader:
      // Inform the RamManager
      ramManager.unreserve(bufferSize);

And InMemoryReader is used in
          Reader<K, V> reader =
            new InMemoryReader<K, V>(ramManager, mo.mapAttemptId,
                                     mo.data, 0, mo.data.length);


On Wed, Mar 10, 2010 at 3:34 PM, Chris Douglas <chrisdo@yahoo-inc.com>wrote:

> I don't think this OOM is a framework bug per se, and given the
> rewrite/refactoring of the shuffle in MAPREDUCE-318 (in 0.21), tuning the
> 0.20 shuffle semantics is likely not worthwhile (though data informing
> improvements to trunk would be excellent). Most likely (and tautologically),
> ReduceTask simply requires more memory than is available and the job failure
> can be avoided by either 0) increasing the heap size or 1) lowering
> mapred.shuffle.input.buffer.percent. Most of the tasks we run have a heap of
> 1GB. For a reduce fetching >200k map outputs, that's a reasonable, even
> stingy amount of space. -C
>
>
> On Mar 10, 2010, at 5:26 AM, Ted Yu wrote:
>
>  I verified that size and maxSize are long. This means MR-1182 didn't
>> resolve
>> Andy's issue.
>>
>> According to Andy:
>> At the beginning of the job there are 209,754 pending map tasks and 32
>> pending reduce tasks
>>
>> My guess is that GC wasn't reclaiming memory fast enough, leading to OOME
>> because of large number of in-memory shuffle candidates.
>>
>> My suggestion for Andy would be to:
>> 1. add -*verbose*:*gc as JVM parameter
>> 2. modify reserve() slightly to calculate the maximum outstanding
>> numPendingRequests and print the maximum.
>>
>> Based on the output from above two items, we can discuss solution.
>> My intuition is to place upperbound on numPendingRequests beyond which
>> canFitInMemory() returns false.
>> *
>> My two cents.
>>
>> On Tue, Mar 9, 2010 at 11:51 PM, Christopher Douglas
>> <chrisdo@yahoo-inc.com>wrote:
>>
>>  That section of code is unmodified in MR-1182. See the patches/svn log.
>>> -C
>>>
>>> Sent from my iPhone
>>>
>>>
>>> On Mar 9, 2010, at 7:44 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>>>
>>> I just downloaded hadoop-0.20.2 tar ball from cloudera mirror.
>>>
>>>> This is what I see in ReduceTask (line 999):
>>>>   public synchronized boolean reserve(int requestedSize, InputStream in)
>>>>
>>>>   throws InterruptedException {
>>>>     // Wait till the request can be fulfilled...
>>>>     while ((size + requestedSize) > maxSize) {
>>>>
>>>> I don't see the fix from MR-1182.
>>>>
>>>> That's why I suggested to Andy that he manually apply MR-1182.
>>>>
>>>> Cheers
>>>>
>>>> On Tue, Mar 9, 2010 at 5:01 PM, Andy Sautins <
>>>> andy.sautins@returnpath.net
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>>
>>>>  Thanks Christopher.
>>>>>
>>>>> The heap size for reduce tasks is configured to be 640M (
>>>>> mapred.child.java.opts set to -Xmx640m ).
>>>>>
>>>>> Andy
>>>>>
>>>>> -----Original Message-----
>>>>> From: Christopher Douglas [mailto:chrisdo@yahoo-inc.com]
>>>>> Sent: Tuesday, March 09, 2010 5:19 PM
>>>>> To: common-user@hadoop.apache.org
>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>
>>>>> No, MR-1182 is included in 0.20.2
>>>>>
>>>>> What heap size have you set for your reduce tasks? -C
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Mar 9, 2010, at 2:34 PM, "Ted Yu" <yuzhihong@gmail.com> wrote:
>>>>>
>>>>> Andy:
>>>>>
>>>>>> You need to manually apply the patch.
>>>>>>
>>>>>> Cheers
>>>>>>
>>>>>> On Tue, Mar 9, 2010 at 2:23 PM, Andy Sautins <
>>>>>>
>>>>>>  andy.sautins@returnpath.net
>>>>>
>>>>>  wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>  Thanks Ted.  My understanding is that MAPREDUCE-1182 is included
>>>>>>> in the
>>>>>>> 0.20.2 release.  We upgraded our cluster to 0.20.2 this weekend
and
>>>>>>> re-ran
>>>>>>> the same job scenarios.  Running with mapred.reduce.parallel.copies
>>>>>>> set to 1
>>>>>>> and continue to have the same Java heap space error.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
>>>>>>> Sent: Tuesday, March 09, 2010 12:56 PM
>>>>>>> To: common-user@hadoop.apache.org
>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>>>
>>>>>>> This issue has been resolved in
>>>>>>> http://issues.apache.org/jira/browse/MAPREDUCE-1182
>>>>>>>
>>>>>>> Please apply the patch
>>>>>>> M1182-1v20.patch<
>>>>>>>
>>>>>>>
>>>>>>>
>>>>> http://issues.apache.org/jira/secure/attachment/12424116/M1182-1v20.patch
>>>>>
>>>>>
>>>>>>
>>>>>>>>  On Sun, Mar 7, 2010 at 3:57 PM, Andy Sautins <
>>>>>>>
>>>>>>>  andy.sautins@returnpath.net
>>>>>>
>>>>>
>>>>>  wrote:
>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>  Thanks Ted.  Very helpful.  You are correct that I misunderstood
the
>>>>>>>>
>>>>>>>>  code
>>>>>>>
>>>>>>>  at ReduceTask.java:1535.  I missed the fact that it's in a
>>>>>>>> IOException
>>>>>>>>
>>>>>>>>  catch
>>>>>>>
>>>>>>>  block.  My mistake.  That's what I get for being in a rush.
>>>>>>>>
>>>>>>>> For what it's worth I did re-run the job with
>>>>>>>> mapred.reduce.parallel.copies set with values from 5 all
the way
>>>>>>>> down to
>>>>>>>>
>>>>>>>>  1.
>>>>>>>
>>>>>>>  All failed with the same error:
>>>>>>>>
>>>>>>>> Error: java.lang.OutOfMemoryError: Java heap space
>>>>>>>>   at
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
>>>>>>>>
>>>>>>> (ReduceTask.java:1195)
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>> So from that it does seem like something else might be going
on,
>>>>>>>> yes?
>>>>>>>>
>>>>>>>>  I
>>>>>>>
>>>>>>>  need to do some more research.
>>>>>>>>
>>>>>>>> I appreciate your insights.
>>>>>>>>
>>>>>>>> Andy
>>>>>>>>
>>>>>>>> -----Original Message-----
>>>>>>>> From: Ted Yu [mailto:yuzhihong@gmail.com]
>>>>>>>> Sent: Sunday, March 07, 2010 3:38 PM
>>>>>>>> To: common-user@hadoop.apache.org
>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>>>>
>>>>>>>> My observation is based on this call chain:
>>>>>>>> MapOutputCopier.run() calling copyOutput() calling getMapOutput()
>>>>>>>> calling
>>>>>>>> ramManager.canFitInMemory(decompressedLength)
>>>>>>>>
>>>>>>>> Basically ramManager.canFitInMemory() makes decision without
>>>>>>>> considering
>>>>>>>> the
>>>>>>>> number of MapOutputCopiers that are running. Thus 1.25 *
0.7 of
>>>>>>>> total
>>>>>>>>
>>>>>>>>  heap
>>>>>>>
>>>>>>>  may be used in shuffling if default parameters were used.
>>>>>>>> Of course, you should check the value for
>>>>>>>> mapred.reduce.parallel.copies
>>>>>>>>
>>>>>>>>  to
>>>>>>>
>>>>>>>  see if it is 5. If it is 4 or lower, my reasoning wouldn't apply.
>>>>>>>>
>>>>>>>> About ramManager.unreserve() call, ReduceTask.java from hadoop
>>>>>>>> 0.20.2
>>>>>>>>
>>>>>>>>  only
>>>>>>>
>>>>>>>  has 2731 lines. So I have to guess the location of the code
>>>>>>>> snippet you
>>>>>>>> provided.
>>>>>>>> I found this around line 1535:
>>>>>>>>   } catch (IOException ioe) {
>>>>>>>>     LOG.info("Failed to shuffle from " +
>>>>>>>> mapOutputLoc.getTaskAttemptId(),
>>>>>>>>              ioe);
>>>>>>>>
>>>>>>>>     // Inform the ram-manager
>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
>>>>>>>>     ramManager.unreserve(mapOutputLength);
>>>>>>>>
>>>>>>>>     // Discard the map-output
>>>>>>>>     try {
>>>>>>>>       mapOutput.discard();
>>>>>>>>     } catch (IOException ignored) {
>>>>>>>>       LOG.info("Failed to discard map-output from " +
>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
>>>>>>>>     }
>>>>>>>> Please confirm the line number.
>>>>>>>>
>>>>>>>> If we're looking at the same code, I am afraid I don't see
how we
>>>>>>>> can
>>>>>>>> improve it. First, I assume IOException shouldn't happen
that often.
>>>>>>>> Second,
>>>>>>>> mapOutput.discard() just sets:
>>>>>>>>     data = null;
>>>>>>>> for in memory case. Even if we call mapOutput.discard() before
>>>>>>>> ramManager.unreserve(), we don't know when GC would kick
in and
>>>>>>>> make more
>>>>>>>> memory available.
>>>>>>>> Of course, given the large number of map outputs in your
system, it
>>>>>>>>
>>>>>>>>  became
>>>>>>>
>>>>>>>  more likely that the root cause from my reasoning made OOME
happen
>>>>>>>>
>>>>>>>>  sooner.
>>>>>>>
>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>  On Sun, Mar 7, 2010 at 1:03 PM, Andy Sautins <
>>>>>>>>>
>>>>>>>>
>>>>>>>>  andy.sautins@returnpath.net
>>>>>>>
>>>>>>>  wrote:
>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>  Ted,
>>>>>>>>>
>>>>>>>>> I'm trying to follow the logic in your mail and I'm not
sure I'm
>>>>>>>>> following.  If you would mind helping me understand I
would
>>>>>>>>> appreciate
>>>>>>>>>
>>>>>>>>>  it.
>>>>>>>>
>>>>>>>>
>>>>>>>>> Looking at the code maxSingleShuffleLimit is only used
in
>>>>>>>>> determining
>>>>>>>>>
>>>>>>>>>  if
>>>>>>>>
>>>>>>>>  the copy _can_ fit into memory:
>>>>>>>>>
>>>>>>>>> boolean canFitInMemory(long requestedSize) {
>>>>>>>>>   return (requestedSize < Integer.MAX_VALUE &&
>>>>>>>>>           requestedSize < maxSingleShuffleLimit);
>>>>>>>>>  }
>>>>>>>>>
>>>>>>>>> It also looks like the RamManager.reserve should wait
until
>>>>>>>>> memory
>>>>>>>>>
>>>>>>>>>  is
>>>>>>>>
>>>>>>>
>>>>>>>  available so it should hit a memory limit for that reason.
>>>>>>>>
>>>>>>>>>
>>>>>>>>> What does seem a little strange to me is the following
(
>>>>>>>>>
>>>>>>>>>  ReduceTask.java
>>>>>>>>
>>>>>>>>  starting at 2730 ):
>>>>>>>>>
>>>>>>>>>     // Inform the ram-manager
>>>>>>>>>     ramManager.closeInMemoryFile(mapOutputLength);
>>>>>>>>>     ramManager.unreserve(mapOutputLength);
>>>>>>>>>
>>>>>>>>>     // Discard the map-output
>>>>>>>>>     try {
>>>>>>>>>       mapOutput.discard();
>>>>>>>>>     } catch (IOException ignored) {
>>>>>>>>>       LOG.info("Failed to discard map-output from " +
>>>>>>>>>                mapOutputLoc.getTaskAttemptId(), ignored);
>>>>>>>>>     }
>>>>>>>>>     mapOutput = null;
>>>>>>>>>
>>>>>>>>> So to me that looks like the ramManager unreserves the
memory
>>>>>>>>> before
>>>>>>>>>
>>>>>>>>>  the
>>>>>>>>
>>>>>>>>  mapOutput is discarded.  Shouldn't the mapOutput be discarded
>>>>>>>>> _before_
>>>>>>>>>
>>>>>>>>>  the
>>>>>>>>
>>>>>>>>  ramManager unreserves the memory?  If the memory is unreserved
>>>>>>>>> before
>>>>>>>>>
>>>>>>>>>  the
>>>>>>>>
>>>>>>>
>>>>>>>  actual underlying data references are removed then it seems
like
>>>>>>>>
>>>>>>>>>
>>>>>>>>>  another
>>>>>>>>
>>>>>>>
>>>>>>>  thread can try to allocate memory ( ReduceTask.java:2730 ) before
>>>>>>>>
>>>>>>>>> the
>>>>>>>>> previous memory is disposed ( mapOutput.discard() ).
>>>>>>>>>
>>>>>>>>> Not sure that makes sense.  One thing to note is that
the
>>>>>>>>> particular
>>>>>>>>>
>>>>>>>>>  job
>>>>>>>>
>>>>>>>>  that is failing does have a good number ( 200k+ ) of map
>>>>>>>>> outputs.  The
>>>>>>>>>
>>>>>>>>>  large
>>>>>>>>
>>>>>>>>  number of small map outputs may be why we are triggering
a problem.
>>>>>>>>>
>>>>>>>>> Thanks again for your thoughts.
>>>>>>>>>
>>>>>>>>> Andy
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Jacob R Rideout [mailto:apache@jacobrideout.net]
>>>>>>>>> Sent: Sunday, March 07, 2010 1:21 PM
>>>>>>>>> To: common-user@hadoop.apache.org
>>>>>>>>> Cc: Andy Sautins; Ted Yu
>>>>>>>>> Subject: Re: Shuffle In Memory OutOfMemoryError
>>>>>>>>>
>>>>>>>>> Ted,
>>>>>>>>>
>>>>>>>>> Thank you. I filled MAPREDUCE-1571 to cover this issue.
I might
>>>>>>>>> have
>>>>>>>>> some time to write a patch later this week.
>>>>>>>>>
>>>>>>>>> Jacob Rideout
>>>>>>>>>
>>>>>>>>> On Sat, Mar 6, 2010 at 11:37 PM, Ted Yu <yuzhihong@gmail.com>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>  I think there is mismatch (in ReduceTask.java) between:
>>>>>>>>>>  this.numCopiers = conf.getInt("mapred.reduce.parallel.copies",
>>>>>>>>>>
>>>>>>>>>>  5);
>>>>>>>>>
>>>>>>>>
>>>>>>>  and:
>>>>>>>>
>>>>>>>>>   maxSingleShuffleLimit = (long)(maxSize *
>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION);
>>>>>>>>>> where MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION is 0.25f
>>>>>>>>>>
>>>>>>>>>> because
>>>>>>>>>>  copiers = new ArrayList<MapOutputCopier>(numCopiers);
>>>>>>>>>> so the total memory allocated for in-mem shuffle
is 1.25 * maxSize
>>>>>>>>>>
>>>>>>>>>> A JIRA should be filed to correlate the constant
5 above and
>>>>>>>>>> MAX_SINGLE_SHUFFLE_SEGMENT_FRACTION.
>>>>>>>>>>
>>>>>>>>>> Cheers
>>>>>>>>>>
>>>>>>>>>> On Sat, Mar 6, 2010 at 8:31 AM, Jacob R Rideout <
>>>>>>>>>>
>>>>>>>>>>  apache@jacobrideout.net
>>>>>>>>>
>>>>>>>>
>>>>>>>>  wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi all,
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> We are seeing the following error in our reducers
of a particular
>>>>>>>>>>>
>>>>>>>>>>>  job:
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>>  Error: java.lang.OutOfMemoryError: Java heap space
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
>>>>>>>>
>>>>>>> (ReduceTask.java:1195)
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>> After enough reducers fail the entire job fails.
This error
>>>>>>>>>>> occurs
>>>>>>>>>>> regardless of whether mapred.compress.map.output
is true. We were
>>>>>>>>>>>
>>>>>>>>>>>  able
>>>>>>>>>>
>>>>>>>>>
>>>>>>>  to avoid the issue by reducing
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>  mapred.job.shuffle.input.buffer.percent
>>>>>>>>>>
>>>>>>>>>
>>>>>>>  to 20%. Shouldn't the framework via
>>>>>>>>
>>>>>>>>> ShuffleRamManager.canFitInMemory
>>>>>>>>>>> and.ShuffleRamManager.reserve correctly detect
the the memory
>>>>>>>>>>> available for allocation? I would think that
with poor
>>>>>>>>>>> configuration
>>>>>>>>>>> settings (and default settings in particular)
the job may not
>>>>>>>>>>> be as
>>>>>>>>>>> efficient, but wouldn't die.
>>>>>>>>>>>
>>>>>>>>>>> Here is some more context in the logs, I have
attached the full
>>>>>>>>>>> reducer log here: http://gist.github.com/323746
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> 2010-03-06 07:54:49,621 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>> Shuffling 4191933 bytes (435311 raw bytes) into
RAM from
>>>>>>>>>>> attempt_201003060739_0002_m_000061_0
>>>>>>>>>>> 2010-03-06 07:54:50,222 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>>
>>>>>>>>>>>  Task
>>>>>>>>>>
>>>>>>>>>
>>>>>>>  attempt_201003060739_0002_r_000000_0: Failed fetch #1 from
>>>>>>>>
>>>>>>>>> attempt_201003060739_0002_m_000202_0
>>>>>>>>>>> 2010-03-06 07:54:50,223 WARN org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 adding host
>>>>>>>>>>> hd37.dfs.returnpath.net to penalty box, next
contact in 4
>>>>>>>>>>> seconds
>>>>>>>>>>> 2010-03-06 07:54:50,223 INFO org.apache.hadoop.mapred.ReduceTask:
>>>>>>>>>>> attempt_201003060739_0002_r_000000_0: Got 1 map-outputs
from
>>>>>>>>>>>
>>>>>>>>>>>  previous
>>>>>>>>>>
>>>>>>>>>
>>>>>>>  failures
>>>>>>>>
>>>>>>>>> 2010-03-06 07:54:50,223 FATAL
>>>>>>>>>>> org.apache.hadoop.mapred.TaskRunner:
>>>>>>>>>>> attempt_201003060739_0002_r_000000_0 : Map output
copy failure :
>>>>>>>>>>> java.lang.OutOfMemoryError: Java heap space
>>>>>>>>>>>   at
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.shuffleInMemory(ReduceTask.java:1508)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.getMapOutput(ReduceTask.java:1408)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier
>>>>>>>>
>>>>>>> $MapOutputCopier.copyOutput(ReduceTask.java:1261)
>>>>>>>
>>>>>>>    at
>>>>>>>>
>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run
>>>>>>>>
>>>>>>> (ReduceTask.java:1195)
>>>>>>>
>>>>>>>
>>>>>>>>
>>>>>>>>>>> We tried this both in 0.20.1 and 0.20.2. We had
hoped
>>>>>>>>>>> MAPREDUCE-1182
>>>>>>>>>>> would address the issue in 0.20.2, but it did
not. Does anyone
>>>>>>>>>>> have
>>>>>>>>>>> any comments or suggestions? Is this a bug I
should file a JIRA
>>>>>>>>>>> for?
>>>>>>>>>>>
>>>>>>>>>>> Jacob Rideout
>>>>>>>>>>> Return Path
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message