hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From John Heidemann <jo...@isi.edu>
Subject "Broken pipe" in hadoop streaming...looking for debugging hints
Date Thu, 05 Jul 2007 18:10:31 GMT

I'm running hadoop streaming from svn (version 552930, reasonably
recent).  My map/reduce job maps ~1M records, but then a few reduces
succeed and many fail, eventually terminating the job unsuccessfully.
I'm looking for some debugging hints.


The failures are all "broken pipe":

java.io.IOException: R/W/S=1/0/0 in:0=1/342 [rec/s] out:0=0/342 [rec/s]
minRecWrittenToEnableSkip_=9223372036854775807 LOGNAME=null
HOST=null
USER=hadoop
HADOOP_USER=null
last Hadoop input: |null|
last tool output: |null|
Date: Thu Jul 05 10:46:28 PDT 2007
Broken pipe
	at org.apache.hadoop.streaming.PipeReducer.reduce(PipeReducer.java:91)
	at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:323)
	at org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:1763)


The log looks reasonable (to my naive eyes):

2007-07-05 10:46:24,248 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Need
2 map output(s)
2007-07-05 10:46:25,105 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Got
3 new map outputs from tasktracker and 0 map outputs from previous failures
2007-07-05 10:46:25,105 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Got
6 known map output location(s); scheduling...
2007-07-05 10:46:25,106 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Scheduled
5 of 6 known outputs (0 slow hosts and 1 dup hosts)
2007-07-05 10:46:25,106 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying
task_0004_m_000496_1 output from rim.isi.edu.
2007-07-05 10:46:25,106 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying
task_0004_m_000496_0 output from bay.isi.edu.
2007-07-05 10:46:25,107 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying
task_0004_m_000495_0 output from air.isi.edu.
2007-07-05 10:46:25,108 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying
task_0004_m_000495_1 output from hey.isi.edu.
2007-07-05 10:46:25,109 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Ignoring
obsolete copy result for Map Task: task_0004_m_000500_1 from host: sky.isi.edu
2007-07-05 10:46:25,111 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 done
copying task_0004_m_000496_0 output from bay.isi.edu.
2007-07-05 10:46:25,148 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 done
copying task_0004_m_000495_0 output from air.isi.edu.
2007-07-05 10:46:25,184 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Ignoring
obsolete copy result for Map Task: task_0004_m_000496_1 from host: rim.isi.edu
2007-07-05 10:46:25,184 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Ignoring
obsolete copy result for Map Task: task_0004_m_000495_1 from host: hey.isi.edu
2007-07-05 10:46:25,185 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Copying
of all map outputs complete. Initiating the last merge on the remaining files in ramfs://mapoutput24529889
2007-07-05 10:46:27,558 INFO org.apache.hadoop.mapred.ReduceTask: task_0004_r_000003_0 Merge
of the 503 files in InMemoryFileSystem complete. Local file is /usr/local/hadoop/hadoop-hadoop/mapred/local/task_0004_r_000003_0/map_332.out
2007-07-05 10:46:28,528 INFO org.apache.hadoop.streaming.PipeMapRed: R/W/S=1/0/0 in:0=1/342
[rec/s] out:0=0/342 [rec/s]
2007-07-05 10:46:28,575 INFO org.apache.hadoop.streaming.PipeMapRed: MROutputThread done
2007-07-05 10:46:28,601 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2007-07-05 10:46:28,612 INFO org.apache.hadoop.streaming.PipeMapRed: mapRedFinished
2007-07-05 10:46:28,794 WARN org.apache.hadoop.mapred.TaskTracker: Error running child



To try and figure out what's up, I tried to tap IO around my reducer,
making it this shell script:

	#!/bin/sh
	tee /tmp/reduce.$$.in | $HOME/reduce_command 16 |tee /tmp/reduce.$$.out
	cat /tmp/reduce.$$.out
	exit 0

And I get 0-length .in and .out files.

This doesn't seem right.  Hadoop streaming should give my reducing
program some input, right?  Without input I don't see that I can do
much.  (My guess is I don't get any input and it eventually timesout and
kills me.)


(When I go look in the raw reduce directories, e.g.,
task_0004_m_000455_0, I do see reasonable looking stuff, including my input.)



The other strange thing is I don't get 100% reduce failures, but maybe
490/503 fail.


Any suggestions?

   -John Heidemann



Mime
View raw message