Mailing-List: contact mapreduce-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: mapreduce-user@hadoop.apache.org
Received-SPF: pass (nike.apache.org: domain of geoffry.roberts@gmail.com
 designates 209.85.216.197 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=gamma;
        h=mime-version:in-reply-to:references:date:message-id:subject:from:to
         :content-type;
        b=c56Dg9q5X/fAt78G0eXPa9EWWwN74Wo9dK+KnSiYvSoLlx1ALBi93udtedS6U+BBRr
         KwX9emtHef1GVHOP9rCZYX/enJArhckGswEzjElStVGXYwOTpdpIuLyPh/AoNieB+9LZ
         nSiPTsudycOmqFMXKwlfY3LsM1/hGqZti0vxo=
MIME-Version: 1.0
In-Reply-To: <314098690910080701l6c986949ke75583bba0557932@mail.gmail.com>
References: <9ec65d4c0910071352x39c20f0di9a7f4f4b6a964cea@mail.gmail.com>
	 <314098690910080701l6c986949ke75583bba0557932@mail.gmail.com>
Date: Thu, 8 Oct 2009 07:04:41 -0800
Message-ID: <9ec65d4c0910080804y6f7c1681xee1db728a67b33cf@mail.gmail.com>
Subject: Re: MapRed Job Completes; Output Ceases Mid-Job
From: Geoffry Roberts <geoffry.roberts@gmail.com>
To: mapreduce-user@hadoop.apache.org
Content-Type: multipart/alternative; boundary=00504502b8516edd0804756dc891

--00504502b8516edd0804756dc891
Content-Type: text/plain; charset=ISO-8859-1

Jason,

Quite possibly, here's what I did: I upped "dfs.datanode.max.xcievers" to
512, which is a doubling, and the full set of output files are created
correctly.

Thanks for responding.

Learning, learning the ins and outs of Hadoop.

On Thu, Oct 8, 2009 at 6:01 AM, Jason Venner <jason.hadoop@gmail.com> wrote:

> Are you perhaps creating large numbers of files, and running out of file
> descriptors in your tasks.
>
>
> On Wed, Oct 7, 2009 at 1:52 PM, Geoffry Roberts <geoffry.roberts@gmail.com
> > wrote:
>
>> All,
>>
>> I have a MapRed job that ceases to produce output about halfway through.
>> The obvious question is why?
>>
>> This job reads a file and uses MultipleTextOutputFormat to generate output
>> files named with the output key.  At about the halfway point, the job
>> continues to create files, but they are all of zero length.    I've worked
>> with this input file extensively and I know it actually contains the
>> required data and that it is clean or at least it was when I copied it in.
>>
>> My first impulse was to check for a full disk, but there seems to be ample
>> free space.
>>
>> This doesn't appear to have anything to do with my code.
>>
>> stderror is full of the following entry:
>>
>> java.io.EOFException
>>
>>
>> 	at java.io.DataInputStream.readByte(DataInputStream.java:250)
>> 	at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>> 	at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>> 	at org.apache.hadoop.io.Text.readString(Text.java:400)
>>
>>
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2837)
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2762)
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)
>>
>>
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)
>>
>>
>> syslog for the reducer starts filling up with the following at what could
>> indeed be the halfway point:
>>
>> 2009-10-07 11:27:50,874 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException
>>
>>
>> 2009-10-07 11:27:50,916 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-1693260904457793456_3495
>> 2009-10-07 11:27:56,919 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException
>>
>>
>> 2009-10-07 11:27:56,919 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_7536254999085848659_3495
>> 2009-10-07 11:28:02,921 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException
>>
>>
>> 2009-10-07 11:28:02,921 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_-7513223558440754487_3495
>> 2009-10-07 11:28:08,924 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFException
>>
>>
>> 2009-10-07 11:28:08,924 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning block blk_2580888829875117043_3495
>> 2009-10-07 11:28:14,965 WARN org.apache.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unable to create new block.
>>
>>
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2781)
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)
>> 	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2232)
>>
>>
>>
>
>
> --
> Pro Hadoop, a book to guide you from beginner to hadoop mastery,
> http://www.amazon.com/dp/1430219424?tag=jewlerymall
> www.prohadoopbook.com a community for Hadoop Professionals
>

--00504502b8516edd0804756dc891
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

Jason,<br><br>Quite possibly, here&#39;s what I did: I upped &quot;dfs.data=
node.max.xcievers&quot; to 512, which is a doubling, and the full set of ou=
tput files are created correctly.=A0 <br><br>Thanks for responding.<br><br>
Learning, learning the ins and outs of Hadoop.<br><br><div class=3D"gmail_q=
uote">On Thu, Oct 8, 2009 at 6:01 AM, Jason Venner <span dir=3D"ltr">&lt;<a=
 href=3D"mailto:jason.hadoop@gmail.com">jason.hadoop@gmail.com</a>&gt;</spa=
n> wrote:<br>
<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">Are you perhaps c=
reating large numbers of files, and running out of file descriptors in your=
 tasks.<div>
<div></div><div class=3D"h5"><br><br><div class=3D"gmail_quote">On Wed, Oct=
 7, 2009 at 1:52 PM, Geoffry Roberts <span dir=3D"ltr">&lt;<a href=3D"mailt=
o:geoffry.roberts@gmail.com" target=3D"_blank">geoffry.roberts@gmail.com</a=
>&gt;</span> wrote:<br>

<blockquote class=3D"gmail_quote" style=3D"border-left: 1px solid rgb(204, =
204, 204); margin: 0pt 0pt 0pt 0.8ex; padding-left: 1ex;">All,<br><br>I hav=
e a MapRed job that ceases to produce output about halfway through.=A0 The =
obvious question is why?<br>

<br>This job reads a file and uses MultipleTextOutputFormat to generate out=
put files named with the output key.=A0 At about the halfway point, the job=
 continues to create files, but they are all of zero length.=A0=A0=A0 I&#39=
;ve worked with this input file extensively and I know it actually contains=
 the required data and that it is clean or at least it was when I copied it=
 in.<br>


<br>My first impulse was to check for a full disk, but there seems to be am=
ple free space.<br><br>This doesn&#39;t appear to have anything to do with =
my code.<br><br>stderror is full of the following entry:<br><pre>java.io.EO=
FException<br>


	at java.io.DataInputStream.readByte(DataInputStream.java:250)<br>	at org.a=
pache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)<br>	at org.=
apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)<br>	at org.=
apache.hadoop.io.Text.readString(Text.java:400)<br>


	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStrea=
m(DFSClient.java:2837)<br>	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStr=
eam.nextBlockOutputStream(DFSClient.java:2762)<br>	at org.apache.hadoop.hdf=
s.DFSClient$DFSOutputStream.access$2000(DFSClient.java:2046)<br>


	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSCl=
ient.java:2232)<br></pre><br>syslog for the reducer starts filling up with =
the following at what could indeed be the halfway point:<br><pre>2009-10-07=
 11:27:50,874 INFO org.apache.hadoop.hdfs.DFSClient: Exception in createBlo=
ckOutputStream java.io.EOFException<br>


2009-10-07 11:27:50,916 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b=
lock blk_-1693260904457793456_3495<br>2009-10-07 11:27:56,919 INFO org.apac=
he.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFE=
xception<br>


2009-10-07 11:27:56,919 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b=
lock blk_7536254999085848659_3495<br>2009-10-07 11:28:02,921 INFO org.apach=
e.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFEx=
ception<br>


2009-10-07 11:28:02,921 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b=
lock blk_-7513223558440754487_3495<br>2009-10-07 11:28:08,924 INFO org.apac=
he.hadoop.hdfs.DFSClient: Exception in createBlockOutputStream java.io.EOFE=
xception<br>


2009-10-07 11:28:08,924 INFO org.apache.hadoop.hdfs.DFSClient: Abandoning b=
lock blk_2580888829875117043_3495<br>2009-10-07 11:28:14,965 WARN org.apach=
e.hadoop.hdfs.DFSClient: DataStreamer Exception: java.io.IOException: Unabl=
e to create new block.<br>


	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(=
DFSClient.java:2781)<br>	at org.apache.hadoop.hdfs.DFSClient$DFSOutputStrea=
m.access$2000(DFSClient.java:2046)<br>	at org.apache.hadoop.hdfs.DFSClient$=
DFSOutputStream$DataStreamer.run(DFSClient.java:2232)<br>


</pre><br>
</blockquote></div><br><br clear=3D"all"><br></div></div><font color=3D"#88=
8888">-- <br>Pro Hadoop, a book to guide you from beginner to hadoop master=
y,<br><a href=3D"http://www.amazon.com/dp/1430219424?tag=3Djewlerymall" tar=
get=3D"_blank">http://www.amazon.com/dp/1430219424?tag=3Djewlerymall</a><br=
>

<a href=3D"http://www.prohadoopbook.com" target=3D"_blank">www.prohadoopboo=
k.com</a> a community for Hadoop Professionals<br>
</font></blockquote></div><br>

--00504502b8516edd0804756dc891--