hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amandeep Khurana <ama...@gmail.com>
Subject Re: Tasktracker getting blacklisted
Date Fri, 04 Dec 2009 21:17:40 GMT
Seems like the reducer isnt able to read from the mapper node. Do you see
something in the datanode logs? Also, check the namenode logs.. Make sure
you have DEBUG logging enabled.

-Amandeep


Amandeep Khurana
Computer Science Graduate Student
University of California, Santa Cruz


On Fri, Dec 4, 2009 at 12:29 PM, Madhur Khandelwal <madhur@thefind.com>wrote:

> Hi all,
>
> I have a 3 node cluster running a hadoop (0.20.1) job. I am noticing the
> following exception during the SHUFFLE phase because of which tasktracker
> on
> one of the nodes is getting blacklisted (after 4 occurrences of the
> exception). I have the config set to run 8 maps and 8 reduces
> simultaneously
> and rest all the settings are left default. Any pointers would be helpful.
>
> 2009-12-04 01:04:36,237 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
> shuffle from attempt_200912031748_0002_m_000035_0
> java.net.SocketTimeoutException: Read timed out
>        at java.net.SocketInputStream.socketRead0(Native Method)
>        at java.net.SocketInputStream.read(SocketInputStream.java:129)
>        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
>        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
>        at
> sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:221)
>        at
> sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:662)
>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>        at
>
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConn
> ection.java:2391)
>        at
> org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149)
>        at
> org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
>        at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe
> mory(ReduceTask.java:1522)
>        at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu
> t(ReduceTask.java:1408)
>        at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(
> ReduceTask.java:1261)
>        at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT
> ask.java:1195)
>
> Here is the error message on the web jobtracker UI:
> java.io.IOException: Task process exit with nonzero status of 137.
>        at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)
>
> Around the same time, the tasktracker log has the following WARN messages:
> 2009-12-04 01:09:19,051 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder, call ping(attempt_200912031748_0002_r_000008_0) from
> 127.0.0.1:42371: output error
> 2009-12-04 01:09:21,984 WARN org.apache.hadoop.ipc.Server: IPC Server
> Responder,
>  call getMapCompletionEvents(job_200912031748_0002, 38, 10000,
> attempt_200912031748_0002_r_000008_0) from 127.0.0.1:42371: output error
> 2009-12-04 01:10:02,114 WARN org.apache.hadoop.mapred.TaskRunner:
> attempt_200912031748_0002_r_000008_0 Child Error
> 2009-12-04 01:10:07,567 INFO org.apache.hadoop.mapred.TaskRunner:
> attempt_200912031748_0002_r_000008_0 done; removing files.
>
> There is one more exception I see in the task log, not sure if it's
> related:
> 2009-12-04 01:01:37,120 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
> shuffle from attempt_200912031748_0002_m_000044_0
> java.io.IOException: Premature EOF
>        at
> sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:234)
>        at
> sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:662)
>        at java.io.FilterInputStream.read(FilterInputStream.java:116)
>        at
>
> sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConn
> ection.java:2391)
>        at
> org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149)
>        at
> org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
>        at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe
> mory(ReduceTask.java:1522)
>        at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu
> t(ReduceTask.java:1408)
>        at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(
> ReduceTask.java:1261)
>        at
>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT
> ask.java:1195)
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message