crunch-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Josh Wills <jwi...@cloudera.com>
Subject Re: Crunch on EMR
Date Tue, 01 Oct 2013 20:10:08 GMT
Hey Som,

You should be able to use any of the non-hadoop2 jars for Crunch on EMR,
like the regular 0.7.0:

http://mvnrepository.com/artifact/org.apache.crunch/crunch-core/0.7.0

Those are compiled against the MR1 APIs, which is why you're getting the
TaskInputOutputContext exception (the API changed from MR1 to MR2, which
CDH4.3.0 and hadoop2 use.)

Josh


On Tue, Oct 1, 2013 at 12:00 PM, Som Satpathy <somsatpathy@gmail.com> wrote:

> Hi All,
>
> I have been trying to run crunch jobs on amazon EMR and faced a problem
> while job execution -
>
> "found class org.apache.hadoop.mapreduce.taskinputoutputcontext but
> interface was expected"
>
> This is happening because of hadoop incompatibilities between APIs used
> while implementing the hadoop job, and the hadoop-code that runs in the
> cluster.
>
> My crunch fat jar is based on crunch version 0.7 (CDH 4.3.0) while EMR
> runs hadoop 1.0.3 (where TaskInputOutputContext is implemented as an
> abstract class)
>
> Has any one been able to successfully execute their crunch jobs on EMR?
>
> If yes, what are the best practices to make custom crunch fat jars work on
> EMR?
>
>
> Look forward to hearing your thoughts.
>
> Thanks,
>
> Som
>



-- 
Director of Data Science
Cloudera <http://www.cloudera.com>
Twitter: @josh_wills <http://twitter.com/josh_wills>

Mime
View raw message