hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mo Zhou <moz...@umail.iu.edu>
Subject Re: hadoop streaming on Amazon EC2
Date Wed, 02 Jun 2010 11:40:35 GMT
Thank you Amogh. Elastic mapreduce use 0.18.3.

I tried the first way by download hadoop-0.18.3 to my local machine.
Then I got following warning.

WARN mapred.JobClient: No job jar file set.  User classes may not be
found. See JobConf(Class) or JobConf#setJar(String).

So the results were incorrect.

Thanks,
Mo


On Wed, Jun 2, 2010 at 4:56 AM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:
> Hi,
> Depending on what hadoop version ( 0.18.3??? ) EC2 uses, you can try one of the following
>
> 1. Compile the streaming jar files with your own custom classes and run on ec2 using
this custom jar ( should work for 18.3 . Make sure you pick compatible streaming classes )
>
> 2. Jar up your classes and specify them as -libjars option on command line, and specify
the custom input and output formats as you have on your local machine ( should work for >19.0
)
>
> I have never worked on EC2, so not sure if any easier solution exists.
>
>
> Amogh
>
>
> On 6/2/10 1:52 AM, "Mo Zhou" <mozhou@umail.iu.edu> wrote:
>
> Hi,
>
> I know it may not be suitable to be posted here since it relates to
> EC2 more than Hadoop. However I could not find a solution and hope
> some one here could kindly help me out. Here is my question.
>
> I created my own inputreader and outputformatter to split an input
> file while use hadoop streaming. They are tested in my local machine.
> Following is how I use them.
>
> bin/hadoop  jar hadoop-0.20.2-streaming.jar \
>   -D mapred.map.tasks=4\
>   -D mapred.reduce.tasks=0\
>   -input HumanSeqs.4\
>   -output output\
>   -mapper "./blastp -db nr -evalue 0.001 -outfmt 6"\
>   -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader"\
>   -inputformat "org.apache.hadoop.streaming.StreamFastaInputFormat"
>
>
> I want to deploy the job to elastic mapreduce. I first create a
> streaming job. I specify input and output in S3, mapper,
> and reducer. However I could not find the place where I can specify
> -inputreader and -inputformat.
>
> So my questions are
> 1) how I can upload the class files to be used as inputreader and
> inputformat to elastic mapreduce?
> 2) how I specify to use them in the streaming?
>
> Any reply is appreciated. Thanks for your time!
>
> --
> Thanks,
> Mo
>
>



-- 
Thanks,
Mo

Mime
View raw message