hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Harsh J <ha...@cloudera.com>
Subject Re: Reducers fail without messages on 20.205.0
Date Mon, 26 Dec 2011 20:31:28 GMT
Markus,

Thanks for clarifying! What heap size does this task launch with and what's the machine RAM
and the TaskTracker slots? Can you rule out the Linux OOM killer taking part here?

On 27-Dec-2011, at 1:50 AM, Markus Jelsma wrote:

> Hi,
> 
> They were indeed reported as failed and speculative execution is disabled 
> because this particular job is not side-effect free. We've seen this before 
> but it's rare.
> 
> Thanks
> 
>> Exit code 137 would mean a SIGKILL. Are you positive its reported as a
>> failed task and not as a killed task? If not, it may just be a result of
>> speculative reducer execution at work and is nothing to worry about.
>> 
>> On 27-Dec-2011, at 1:14 AM, Markus Jelsma wrote:
>>> Hi,
>>> 
>>> We sometimes see reducers fail just when all mappers are finishing. All
>>> mappers finish roughly at the same time. The reducers only dump the
>>> following exception:
>>> 
>>> java.lang.Throwable: Child Error
>>> 
>>> 	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:271)
>>> 
>>> Caused by: java.io.IOException: Task process exit with nonzero status of
>>> 137.
>>> 
>>> 	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:258)
>>> 
>>> The reducers own log output also shows nothing that gives a clue, this is
>>> the last part of the log:
>>> 
>>> 2011-12-26 19:35:19,116 INFO org.apache.hadoop.io.compress.CodecPool: Got
>>> brand-new decompressor
>>> 2011-12-26 19:35:19,117 INFO org.apache.hadoop.io.compress.CodecPool: Got
>>> brand-new decompressor
>>> 2011-12-26 19:35:19,117 INFO org.apache.hadoop.io.compress.CodecPool: Got
>>> brand-new decompressor
>>> 2011-12-26 19:35:19,120 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Thread started: Thread for merging
>>> on- disk files
>>> 2011-12-26 19:35:19,120 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Thread waiting: Thread for merging
>>> on- disk files
>>> 2011-12-26 19:35:19,121 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Thread started: Thread for merging
>>> in memory files
>>> 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Need another 50 map output(s) where
>>> 0 is already in progress
>>> 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Thread started: Thread for polling
>>> Map Completion Events
>>> 2011-12-26 19:35:19,122 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Scheduled 0 outputs (0 slow hosts
>>> and0 dup hosts)
>>> 2011-12-26 19:35:24,124 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Scheduled 2 outputs (0 slow hosts
>>> and0 dup hosts)
>>> 2011-12-26 19:35:25,805 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts
>>> and0 dup hosts)
>>> 2011-12-26 19:36:21,578 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Need another 47 map output(s) where
>>> 0 is already in progress
>>> 2011-12-26 19:36:21,593 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts
>>> and0 dup hosts)
>>> 2011-12-26 19:36:42,412 INFO org.apache.hadoop.mapred.ReduceTask:
>>> attempt_201112261420_0006_r_000009_0 Scheduled 1 outputs (0 slow hosts
>>> and0 dup hosts)
>>> 
>>> Is there any advice?
>>> Thanks


Mime
View raw message