hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Madhur Khandelwal" <mad...@thefind.com>
Subject Tasktracker getting blacklisted
Date Fri, 04 Dec 2009 20:29:30 GMT
Hi all,

I have a 3 node cluster running a hadoop (0.20.1) job. I am noticing the
following exception during the SHUFFLE phase because of which tasktracker on
one of the nodes is getting blacklisted (after 4 occurrences of the
exception). I have the config set to run 8 maps and 8 reduces simultaneously
and rest all the settings are left default. Any pointers would be helpful. 

2009-12-04 01:04:36,237 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
shuffle from attempt_200912031748_0002_m_000035_0
java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
        at java.io.BufferedInputStream.read1(BufferedInputStream.java:256)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:317)
        at
sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:221)
        at
sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:662)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConn
ection.java:2391)
        at
org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149)
        at
org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe
mory(ReduceTask.java:1522)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu
t(ReduceTask.java:1408)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(
ReduceTask.java:1261)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT
ask.java:1195)

Here is the error message on the web jobtracker UI:
java.io.IOException: Task process exit with nonzero status of 137.
	at org.apache.hadoop.mapred.TaskRunner.run(TaskRunner.java:418)

Around the same time, the tasktracker log has the following WARN messages:
2009-12-04 01:09:19,051 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder, call ping(attempt_200912031748_0002_r_000008_0) from
127.0.0.1:42371: output error
2009-12-04 01:09:21,984 WARN org.apache.hadoop.ipc.Server: IPC Server
Responder,
 call getMapCompletionEvents(job_200912031748_0002, 38, 10000,
attempt_200912031748_0002_r_000008_0) from 127.0.0.1:42371: output error
2009-12-04 01:10:02,114 WARN org.apache.hadoop.mapred.TaskRunner:
attempt_200912031748_0002_r_000008_0 Child Error
2009-12-04 01:10:07,567 INFO org.apache.hadoop.mapred.TaskRunner:
attempt_200912031748_0002_r_000008_0 done; removing files.

There is one more exception I see in the task log, not sure if it's related:
2009-12-04 01:01:37,120 INFO org.apache.hadoop.mapred.ReduceTask: Failed to
shuffle from attempt_200912031748_0002_m_000044_0
java.io.IOException: Premature EOF
        at
sun.net.www.http.ChunkedInputStream.fastRead(ChunkedInputStream.java:234)
        at
sun.net.www.http.ChunkedInputStream.read(ChunkedInputStream.java:662)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at
sun.net.www.protocol.http.HttpURLConnection$HttpInputStream.read(HttpURLConn
ection.java:2391)
        at
org.apache.hadoop.mapred.IFileInputStream.doRead(IFileInputStream.java:149)
        at
org.apache.hadoop.mapred.IFileInputStream.read(IFileInputStream.java:101)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.shuffleInMe
mory(ReduceTask.java:1522)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.getMapOutpu
t(ReduceTask.java:1408)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.copyOutput(
ReduceTask.java:1261)
        at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier$MapOutputCopier.run(ReduceT
ask.java:1195)


Mime
View raw message