hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Venner <jason.had...@gmail.com>
Subject Re: MultipleTextOutputFormat giving "Bad connect ack with firstBadLink"
Date Tue, 27 Oct 2009 13:45:33 GMT
This error is very common in applications that run out of file descriptors
or simply open vast numbers of files on an and HDFS with a very high block
density per datanode.
It is quite easy to open hundreds of thousands of files with the
Multi*OutputFormat classes.
If you can collect your output in the local file system and generate a zip
file of them and write that to hdfs, there is generally an enormous
performance gain, enormous being greater than 10x.

On Tue, Oct 27, 2009 at 8:24 AM, tim robertson <timrobertson100@gmail.com>wrote:

> Hi all,
>
> I am running a simple job working on an input tab file, running the
> following:
>
> - a simple Mapper which reading a field from the tab file row and
> emitting this as the key and the line as the value.
> - an Identity reducer
> - a MultipleTextOutputFormat emitting a filename based on the key like so:
>
>  protected String generateFileNameForKeyValue(Object key, Object
> value, String name) {
>    return "denorm_taxonomy_" + key.toString();
>  }
>
> This job works fine with a TextOutputFormat, but when I use the
> MultipleTextOutputFormat I get the following errors.
>
> Can someone please help me diagnose this?
>
> Many thanks,
> Tim
>
>
>
>
> 09/10/27 14:10:01 INFO mapred.JobClient:  map 100% reduce 95%
> 09/10/27 14:10:20 INFO mapred.JobClient:  map 100% reduce 96%
> 09/10/27 14:10:24 INFO mapred.JobClient: Task Id :
> attempt_200910191701_0063_r_000000_1, Status : FAILED
> java.io.IOException: Bad connect ack with firstBadLink 192.168.76.8:50010
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2870)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 09/10/27 14:10:25 INFO mapred.JobClient:  map 100% reduce 94%
> 09/10/27 14:10:28 INFO mapred.JobClient:  map 100% reduce 97%
> 09/10/27 14:10:58 INFO mapred.JobClient: Task Id :
> attempt_200910191701_0063_r_000001_2, Status : FAILED
> java.io.IOException: Bad connect ack with firstBadLink 192.168.76.5:50010
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2870)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>
> 09/10/27 14:11:00 INFO mapred.JobClient:  map 100% reduce 98%
> 09/10/27 14:11:01 INFO mapred.JobClient: Task Id :
> attempt_200910191701_0063_r_000001_1, Status : FAILED
> java.io.EOFException
>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>        at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>        at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2868)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2793)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2076)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2262)
>



-- 
Pro Hadoop, a book to guide you from beginner to hadoop mastery,
http://www.amazon.com/dp/1430219424?tag=jewlerymall
www.prohadoopbook.com a community for Hadoop Professionals

Mime
View raw message