hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mo Zhou <moz...@umail.iu.edu>
Subject hadoop streaming on Amazon EC2
Date Tue, 01 Jun 2010 20:22:33 GMT

I know it may not be suitable to be posted here since it relates to
EC2 more than Hadoop. However I could not find a solution and hope
some one here could kindly help me out. Here is my question.

I created my own inputreader and outputformatter to split an input
file while use hadoop streaming. They are tested in my local machine.
Following is how I use them.

bin/hadoop  jar hadoop-0.20.2-streaming.jar \
   -D mapred.map.tasks=4\
   -D mapred.reduce.tasks=0\
   -input HumanSeqs.4\
   -output output\
   -mapper "./blastp -db nr -evalue 0.001 -outfmt 6"\
   -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader"\
   -inputformat "org.apache.hadoop.streaming.StreamFastaInputFormat"

I want to deploy the job to elastic mapreduce. I first create a
streaming job. I specify input and output in S3, mapper,
and reducer. However I could not find the place where I can specify
-inputreader and -inputformat.

So my questions are
1) how I can upload the class files to be used as inputreader and
inputformat to elastic mapreduce?
2) how I specify to use them in the streaming?

Any reply is appreciated. Thanks for your time!


View raw message