On Tue, Jul 13, 2010 at 3:57 AM, Some Bo= dy <somebo= dy@squareplanet.de> wrote:

Hi All,

I had a MR job that processed 2000 small (<3MB ea.) files and it took 40= minutes on 8 nodes.
Since the files are small it triggerred 2000 tasks. =A0I packed my 2000 fil= es into a single 445MB
sequence file (K,V =3D=3D Text,Text =3D=3D <filename>,<file-conten= t>). =A0The new MR job triggers 7 map
tasks (approx 64MB each) but =A0it takes even longer (49 minutes) so I'= m trying to figure out why.

I noticed these errors and I'm hoping someone can shed some light on wh= y?

Before I ran =A0the job I ran hadoop fsck / and everything was healthy.
e.g.: no under-replicated, no corrupt blocks etc.

......
2010-07-13 03:24:20,807 INFO org.apache.hadoop.mapred.ReduceTask: GetMapEve= ntsThread exiting
2010-07-13 03:24:20,807 INFO org.apache.hadoop.mapred.ReduceTask: getMapsEv= entsThread joined.
2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: Closed ra= m manager
2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: Interleav= ed on-disk merge complete: 7 files left.
2010-07-13 03:24:20,808 INFO org.apache.hadoop.mapred.ReduceTask: In-memory= merge complete: 0 files left.
2010-07-13 03:24:20,814 INFO org.apache.hadoop.mapred.ReduceTask: Merging 7= files, 2401573706 bytes from disk
2010-07-13 03:24:20,815 INFO org.apache.hadoop.mapred.ReduceTask: Merging 0= segments, 0 bytes from memory into reduce
2010-07-13 03:24:20,818 INFO org.apache.hadoop.mapred.Merger: Merging 7 sor= ted segments
2010-07-13 03:24:20,827 INFO org.apache.hadoop.mapred.Merger: Down to the l= ast merge-pass, with 7 segments left of total size: 2401573678 bytes
2010-07-13 03:30:42,329 INFO org.apache.hadoop.hdfs.DFSClient: Exception in= createBlockOutputStream java.io.IOException: Bad connect ack with firstBad= Link 10.15.46.73:500= 10
2010-07-13 03:30:42,329 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b= lock blk_4304053493083580280_260714
2010-07-13 03:31:03,846 INFO org.apache.hadoop.hdfs.DFSClient: Exception in= createBlockOutputStream java.io.IOException: Bad connect ack with firstBad= Link 10.15.46.35:500= 10
2010-07-13 03:31:03,846 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b= lock blk_3680469905814989852_260716
2010-07-13 03:31:08,233 INFO org.apache.hadoop.hdfs.DFSClient: Exception in= createBlockOutputStream java.io.IOException: Bad connect ack with firstBad= Link 10.15.46.35:500= 10
2010-07-13 03:31:08,233 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b= lock blk_-673505196560500372_260717
2010-07-13 03:31:14,243 INFO org.apache.hadoop.hdfs.DFSClient: Exception in= createBlockOutputStream java.io.IOException: Bad connect ack with firstBad= Link 10.15.46.73:500= 10
2010-07-13 03:31:14,243 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b= lock blk_-7054031797345836167_260717
......