hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: hadoop streaming on Amazon EC2
Date Wed, 02 Jun 2010 08:56:17 GMT
Depending on what hadoop version ( 0.18.3??? ) EC2 uses, you can try one of the following

1. Compile the streaming jar files with your own custom classes and run on ec2 using this
custom jar ( should work for 18.3 . Make sure you pick compatible streaming classes )

2. Jar up your classes and specify them as -libjars option on command line, and specify the
custom input and output formats as you have on your local machine ( should work for >19.0

I have never worked on EC2, so not sure if any easier solution exists.


On 6/2/10 1:52 AM, "Mo Zhou" <mozhou@umail.iu.edu> wrote:


I know it may not be suitable to be posted here since it relates to
EC2 more than Hadoop. However I could not find a solution and hope
some one here could kindly help me out. Here is my question.

I created my own inputreader and outputformatter to split an input
file while use hadoop streaming. They are tested in my local machine.
Following is how I use them.

bin/hadoop  jar hadoop-0.20.2-streaming.jar \
   -D mapred.map.tasks=4\
   -D mapred.reduce.tasks=0\
   -input HumanSeqs.4\
   -output output\
   -mapper "./blastp -db nr -evalue 0.001 -outfmt 6"\
   -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader"\
   -inputformat "org.apache.hadoop.streaming.StreamFastaInputFormat"

I want to deploy the job to elastic mapreduce. I first create a
streaming job. I specify input and output in S3, mapper,
and reducer. However I could not find the place where I can specify
-inputreader and -inputformat.

So my questions are
1) how I can upload the class files to be used as inputreader and
inputformat to elastic mapreduce?
2) how I specify to use them in the streaming?

Any reply is appreciated. Thanks for your time!


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message