hadoop-mapreduce-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoffry Roberts <geoffry.robe...@gmail.com>
Subject Re: MapRed Job Completes; Output Ceases Mid-Job
Date Thu, 08 Oct 2009 15:04:41 GMT
Jason,

Quite possibly, here's what I did: I upped "dfs.datanode.max.xcievers" to
512, which is a doubling, and the full set of output files are created
correctly.

Thanks for responding.

Learning, learning the ins and outs of Hadoop.

On Thu, Oct 8, 2009 at 6:01 AM, Jason Venner <jason.hadoop@gmail.com> wrote:

> Are you perhaps creating large numbers of files, and running out of file
> descriptors in your tasks.
>
>
> On Wed, Oct 7, 2009 at 1:52 PM, Geoffry Roberts <geoffry.roberts@gmail.com
> > wrote:
>
>> All,
>>
>> I have a MapRed job that ceases to produce output about halfway through.
>> The obvious question is why?
>>
>> This job reads a file and uses MultipleTextOutputFormat to generate output
>> files named with the output key.  At about the halfway point, the job
>> continues to create files, but they are all of zero length.    I've worked
>> with this input file extensively and I know it actually contains the
>> required data and that it is clean or at least it was when I copied it in.
>>
>> My first impulse was to check for a full disk, but there seems to be ample
>> free space.
>>
>> This doesn't appear to have anything to do with my code.
>>
>> stderror is full of the following entry:
>>
>> java.io.EOFException
>>
>>
>> 	at java.io.DataInputStream.readByte(DataInputStream.java:250)
>> 	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>> 	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>> 	at org.apache.hadoop.io.Text.readString(Text.java:400)
>>
>>
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2837)
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2762)
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)
>>
>>
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)
>>
>>
>> syslog for the reducer starts filling up with the following at what could
>> indeed be the halfway point:
>>
>> 2009-10-07 11:27:50,874 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException
>>
>>
>> 2009-10-07 11:27:50,916 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-1693260904457793456_3495
>> 2009-10-07 11:27:56,919 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException
>>
>>
>> 2009-10-07 11:27:56,919 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7536254999085848659_3495
>> 2009-10-07 11:28:02,921 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException
>>
>>
>> 2009-10-07 11:28:02,921 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-7513223558440754487_3495
>> 2009-10-07 11:28:08,924 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream
java.io.EOFException
>>
>>
>> 2009-10-07 11:28:08,924 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2580888829875117043_3495
>> 2009-10-07 11:28:14,965 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception:
java.io.IOException: Unable to create new block.
>>
>>
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2781)
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)
>>
>>
>>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

Mime
View raw message