Mailing-List: contact user-help@avro.apache.org; run by ezmlm
Precedence: bulk
Reply-To: user@avro.apache.org
Received-SPF: pass (nike.apache.org: local policy)
From: Ken Krugler <kkrugler_lists@transpac.com>
Mime-Version: 1.0 (Apple Message framework v1257)
Content-Type: multipart/alternative;
 boundary="Apple-Mail=_F34B7AF0-F209-46BC-8420-6F989F0677A4"
Subject: Re: Hadoop 0.23,
 Avro Specific 1.6.3 and "org.apache.avro.generic.GenericData$Record cannot be
 cast to "
Date: Sun, 13 May 2012 11:18:13 -0700
In-Reply-To: <DUB112-W42C6A00BDBCF4B807CA9B81150@phx.gbl>
To: user@avro.apache.org
References: <DUB112-W42C6A00BDBCF4B807CA9B81150@phx.gbl>
Message-Id: <6D45998A-81E7-49B8-9B4E-88390EB14933@transpac.com>


--Apple-Mail=_F34B7AF0-F209-46BC-8420-6F989F0677A4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252

Hi Jacob,

On May 13, 2012, at 4:48am, Jacob Metcalf wrote:

>=20
> I have just spent several frustrating hours on getting an example MR =
job using Avro working with Hadoop and after finally getting it working =
I thought I would share my findings with everyone.
>=20
> I wrote an example job trying to use Avro MR 1.6.3 to serialize =
between Map and Reduce then attempted to deploy and run. I am setting up =
a development cluster with Hadoop 0.23 running pseudo-distributed under =
cygwin. I ran my job and it failed with:
>=20
> "org.apache.avro.generic.GenericData$Record cannot be cast to =
net.jacobmetcalf.avro.Room"=20
>=20
> Where Room is an Avro generated class. I found two problems. The first =
I have partly solved, the second one is more to do with Hadoop and is as =
yet unsolved:
>=20
> 1) Why when I am using Avro Specific does it end up going Generic?
>=20
> When deserializing SpecificDatumReader.java attempts to instantiate =
your target class through reflection. If it fails to create your class =
it defaults to a GenericData.Record. This Doug has explained here: =
http://mail-archives.apache.org/mod_mbox/avro-user/201101.mbox/%3C4D2B6D56=
.2070108@apache.org%3E=20
>=20
> But why it is doing it was a little harder to work out. Debugging I =
saw the SpecificDatumReader could not find my class in its classpath. =
However in my Job Runner I had done:=20
>=20
> 		job.setJarByClass(HouseAssemblyJob.class);	// This =
should ensure the JAR is distributed around the cluster
>=20
> I expected with this Hadoop would distribute my Jar around the =
cluster. It may be doing the distribution but it definitely did not add =
it to the Reducers classpath. So to get round this I have now set =
HADOOP_CLASSPATH to the directory I am running from. This is not going =
to work in a real cluster where the Job Runner is on a different machine =
to where the Reducer so I am keen to figure out whether the problem is =
Hadoop 0.23, my environment variables or the fact I am running under =
Cygwin.

If your reducer is running, then Hadoop must have distributed your job =
jar.

In that case, any class that's actually in your job jar (in the proper =
position) will be distributed and on the classpath.

Sometimes the problem is that you've got a dependent jar, which then =
needs to be in the "lib" subdirectory inside of your job jar. Are you =
maybe building your Avro generated classes into a separate jar, and then =
adding that to the job jar?

Finally, running under Cygwin is=85challenging. I teach a Hadoop class, =
and often the hardest part of the lab is getting everybody's Cygwin =
installation working with Hadoop. The fact that you've got =
pseudo-distributed mode working on Cygwin is impressive in itself, but I =
would suggest trying your job on a real cluster, e.g. use Elastic =
MapReduce.

> 2) How can I upgrade Hadoop 0.23 to use Avro 1.6.3 ?
>=20
> Whilst debugging I realised that Hadoop is shipping with Avro 1.5.3. I =
however want to use 1.6.3 (and 1.7 when it comes out) because of its =
support for immutability & builders in the generated classes. I probably =
could just hack the old Avro lib out of my Hadoop distribution and drop =
the new one in. However I thought it would be cleaner to get Hadoop to =
distribute my jar to all datanodes and then manipulate my classpath to =
get the latest version of Avro to the top. So I have packaged Avro 1.6.3 =
into my job jar using Maven assembly

Did you ensure that it's inside of the /lib subdirectory? What does your =
job jar look like (via "jar tvf <path to job jar>")?

-- Ken

> and tried to do this in my JobRunner:
>=20
> 		job.setJarByClass( MyJob.class);	                 =
                                                         // This should =
ensure the JAR is distributed around the cluster
> 	        config.setBoolean( =
MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true ); // ensure my =
version of avro?
>=20
> But it continues to use 1.5.3. I suspect it is again to do with my =
HADOOP_CLASSPATH which has avro-1.5.3 in it:
>=20
>                 export =
HADOOP_CLASSPATH=3D"$HADOOP_COMMON_HOME/share/hadoop/mapreduce/*"
>=20
> If anyone has done this and has any ideas please let me know?
>=20
> Thanks
>=20
> Jacob

--------------------------
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Mahout & Solr


--Apple-Mail=_F34B7AF0-F209-46BC-8420-6F989F0677A4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
	charset=windows-1252

<html><head></head><body style=3D"word-wrap: break-word; =
-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Hi =
Jacob,<div><br><div><div>On May 13, 2012, at 4:48am, Jacob Metcalf =
wrote:</div><br class=3D"Apple-interchange-newline"><blockquote =
type=3D"cite"><span class=3D"Apple-style-span" style=3D"border-collapse: =
separate; font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; =
text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"font-family: Tahoma; font-size: =
13px; "><div dir=3D"ltr"><div><br class=3D"Apple-interchange-newline">I =
have just spent several frustrating hours on getting an example MR job =
using Avro working with Hadoop and after finally getting it working I =
thought I would share my findings with =
everyone.</div><div><br></div><div>I wrote an example job trying to use =
Avro MR 1.6.3 to serialize between Map and Reduce then attempted to =
deploy and run. I am setting up a development cluster with Hadoop 0.23 =
running pseudo-distributed under cygwin. I ran my job and it failed =
with:</div><div><br></div><div>"org.apache.avro.generic.GenericData$Record=
 cannot be cast to =
net.jacobmetcalf.avro.Room"&nbsp;</div><div><br></div><div>Where Room is =
an Avro generated class.<span =
class=3D"Apple-converted-space">&nbsp;</span><span style=3D"font-size: =
10pt; ">I found two problems. The first I have partly solved, the second =
one is more to do with Hadoop and is as yet =
unsolved:</span></div><div><br></div><div>1) Why when I am using Avro =
Specific does it end up going Generic?</div><div><br></div><div>W<span =
style=3D"font-size: 10pt; ">hen deserializing SpecificDatumReader.java =
attempts to instantiate your target class through reflection. If it =
fails to create your class it defaults to a =
GenericData.Record.&nbsp;</span><span style=3D"font-size: 10pt; ">This =
Doug has explained here:<span =
class=3D"Apple-converted-space">&nbsp;</span></span><a =
href=3D"http://mail-archives.apache.org/mod_mbox/avro-user/201101.mbox/%3C=
4D2B6D56.2070108@apache.org%3E" style=3D"font-size: 10pt; =
">http://mail-archives.apache.org/mod_mbox/avro-user/201101.mbox/%3C4D2B6D=
56.2070108@apache.org%3E&nbsp;</a></div><div><br =
class=3D"Apple-interchange-newline"></div><div>But why it is doing it =
was a little harder to work out. Debugging I saw =
the&nbsp;SpecificDatumReader&nbsp;could not find my class in its =
classpath. However in my Job Runner I had =
done:&nbsp;</div><div><br></div><div><div><span class=3D"Apple-tab-span" =
style=3D"white-space: pre; ">		=
</span>job.setJarByClass(HouseAssemblyJob.class);<span =
class=3D"Apple-tab-span" style=3D"white-space: pre; ">	</span>// This =
should ensure the JAR is distributed around the =
cluster</div></div><div><br></div><div>I expected with this Hadoop would =
distribute my Jar around the cluster. It may be doing the distribution =
but it definitely did not add it to the Reducers classpath. So to get =
round this I have now set HADOOP_CLASSPATH to the directory I am running =
from. This is not going to work in a real cluster where the Job Runner =
is on a different machine to where the Reducer so I am keen to figure =
out whether the problem is Hadoop 0.23, my environment variables or the =
fact I am running under =
Cygwin.</div></div></span></span></blockquote><div><br></div>If your =
reducer is running, then Hadoop must have distributed your job =
jar.</div><div><br></div><div>In that case, any class that's actually in =
your job jar (in the proper position) will be distributed and on the =
classpath.</div><div><br></div><div>Sometimes the problem is that you've =
got a dependent jar, which then needs to be in the "lib" subdirectory =
inside of your job jar. Are you maybe building your Avro generated =
classes into a separate jar, and then adding that to the job =
jar?</div><div><br></div><div>Finally, running under Cygwin =
is=85challenging. I teach a Hadoop class, and often the hardest part of =
the lab is getting everybody's Cygwin installation working with Hadoop. =
The fact that you've got pseudo-distributed mode working on Cygwin is =
impressive in itself, but I would suggest trying your job on a real =
cluster, e.g. use Elastic =
MapReduce.</div><div><br></div><div><blockquote type=3D"cite"><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; =
font-family: Helvetica; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: =
none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"font-family: Tahoma; font-size: =
13px; "><div dir=3D"ltr"><div>2) How can I upgrade Hadoop 0.23 to use =
Avro 1.6.3 ?</div><div><br></div><div>Whilst debugging I realised that =
Hadoop is shipping with Avro 1.5.3. I however want to use 1.6.3 (and 1.7 =
when it comes out) because of its support for immutability &amp; =
builders in the generated classes.&nbsp;<span style=3D"font-size: 10pt; =
">I probably could just hack the old Avro lib out of my Hadoop =
distribution and drop the new one in. However I thought it would be =
cleaner to get Hadoop to distribute my jar to all datanodes and then =
manipulate my classpath to get the latest version of Avro to the =
top.</span><span style=3D"font-size: 10pt; ">&nbsp;So&nbsp;</span><span =
style=3D"font-size: 10pt; ">I have packaged Avro 1.6.3 into my job jar =
using Maven =
assembly</span></div></div></span></span></blockquote><div><br></div>Did =
you ensure that it's inside of the /lib subdirectory? What does your job =
jar look like (via "jar tvf &lt;path to job =
jar&gt;")?</div><div><br></div><div>-- Ken</div><div><br><blockquote =
type=3D"cite"><span class=3D"Apple-style-span" style=3D"border-collapse: =
separate; font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; =
text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; =
-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: =
0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"font-family: Tahoma; font-size: =
13px; "><div dir=3D"ltr"><div><span style=3D"font-size: 10pt; ">and =
tried to do this in my =
JobRunner:</span></div><div><br></div><div><div><span =
class=3D"Apple-tab-span" style=3D"white-space: pre; ">		=
</span>job.setJarByClass( MyJob.class);<span class=3D"Apple-tab-span" =
style=3D"white-space: pre; ">	</span>&nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp;&nbsp;// This should ensure the JAR is distributed =
around the cluster</div><div><span class=3D"Apple-tab-span" =
style=3D"font-size: 10pt; white-space: pre; ">	</span>&nbsp; &nbsp; =
&nbsp; &nbsp;&nbsp;<span style=3D"font-size: 10pt; ">config.setBoolean( =
MRJobConfig.MAPREDUCE_JOB_USER_CLASSPATH_FIRST, true ); // ensure my =
version of avro?</span></div></div><div><br></div><div>But it continues =
to use 1.5.3. I suspect it is again to do with my HADOOP_CLASSPATH which =
has avro-1.5.3 in it:</div><div><br></div><div>&nbsp; &nbsp; &nbsp; =
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; export =
HADOOP_CLASSPATH=3D"$HADOOP_COMMON_HOME/share/hadoop/mapreduce/*"<br =
class=3D"Apple-interchange-newline"></div><div><br></div><div>If anyone =
has done this and has any ideas please let me =
know?</div><div><br></div><div>Thanks</div><div><br></div><div>Jacob</div>=
</div></span></span></blockquote></div><br><div =
apple-content-edited=3D"true">
<span class=3D"Apple-style-span" style=3D"border-collapse: separate; =
color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; =
font-variant: normal; font-weight: normal; letter-spacing: normal; =
line-height: normal; orphans: 2; text-align: -webkit-auto; text-indent: =
0px; text-transform: none; white-space: normal; widows: 2; word-spacing: =
0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><span =
class=3D"Apple-style-span" style=3D"border-collapse: separate; color: =
rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant: =
normal; font-weight: normal; letter-spacing: normal; line-height: =
normal; orphans: 2; text-indent: 0px; text-transform: none; white-space: =
normal; widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: =
0px; -webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; font-size: medium; "><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: medium; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; "><div style=3D"word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: medium; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; "><div style=3D"word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Monaco; font-size: medium; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; "><div style=3D"word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Monaco; font-size: medium; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; "><div style=3D"word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><div><span class=3D"Apple-style-span" =
style=3D"font-family: Helvetica; =
"><div>--------------------------</div><div>Ken Krugler</div><div><div =
style=3D"word-wrap: break-word; -webkit-nbsp-mode: space; =
-webkit-line-break: after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: medium; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; "><div style=3D"word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: medium; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; "><div style=3D"word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><span class=3D"Apple-style-span" =
style=3D"border-collapse: separate; color: rgb(0, 0, 0); font-family: =
Helvetica; font-size: medium; font-style: normal; font-variant: normal; =
font-weight: normal; letter-spacing: normal; line-height: normal; =
orphans: 2; text-indent: 0px; text-transform: none; white-space: normal; =
widows: 2; word-spacing: 0px; -webkit-border-horizontal-spacing: 0px; =
-webkit-border-vertical-spacing: 0px; =
-webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: =
auto; -webkit-text-stroke-width: 0px; "><div style=3D"word-wrap: =
break-word; -webkit-nbsp-mode: space; -webkit-line-break: =
after-white-space; "><div><a =
href=3D"http://www.scaleunlimited.com">http://www.scaleunlimited.com</a></=
div><div>custom big data solutions &amp; training</div><div>Hadoop, =
Cascading, Mahout &amp; =
Solr</div></div></span></div></span></div></span></div><br =
class=3D"Apple-interchange-newline"></div></span></div></div></span></div>=
</span></div></span></div></span></div></span><br =
class=3D"Apple-interchange-newline"></span><br =
class=3D"Apple-interchange-newline">
</div>

<br></div></body></html>=

--Apple-Mail=_F34B7AF0-F209-46BC-8420-6F989F0677A4--