Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm
Precedence: bulk
Reply-To: common-user@hadoop.apache.org
Received-SPF: neutral (nike.apache.org: local policy)
DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns;
	h=received:from:to:date:subject:thread-topic:thread-index:
	message-id:in-reply-to:accept-language:content-language:
	x-ms-has-attach:x-ms-tnef-correlator:acceptlanguage:content-type:mime-version;
	b=gyknsvrMGU//HxzJ5r/3RK07Nu7h0IX07MGPTSeKJFBgFm29yH3QMLuOOKLH7p6B
From: Amogh Vasekar <amogh@yahoo-inc.com>
To: "common-user@hadoop.apache.org" <common-user@hadoop.apache.org>
Date: Wed, 2 Jun 2010 18:10:05 +0530
Subject: Re: hadoop streaming on Amazon EC2
Thread-Topic: hadoop streaming on Amazon EC2
Thread-Index: AcsCSJAsg3M1I72CTQ2yyQnvljW49QACCm9T
Message-ID: <C82C4EFD.C1D1%amogh@yahoo-inc.com>
In-Reply-To: <AANLkTikQd4eCUtmgOAGh_kVVc7eIZdkzQN20q4WTWzmr@mail.gmail.com>
Accept-Language: en-US
Content-Language: en
acceptlanguage: en-US
Content-Type: multipart/alternative;
	boundary="_000_C82C4EFDC1D1amoghyahooinccom_"
MIME-Version: 1.0

--_000_C82C4EFDC1D1amoghyahooinccom_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,
You might need to add -Dstream.shipped.hadoopstreaming=3D<path_to_your_cust=
om_streaming_jar>

Amogh

On 6/2/10 5:10 PM, "Mo Zhou" <mozhou@umail.iu.edu> wrote:

Thank you Amogh. Elastic mapreduce use 0.18.3.

I tried the first way by download hadoop-0.18.3 to my local machine.
Then I got following warning.

WARN mapred.JobClient: No job jar file set.  User classes may not be
found. See JobConf(Class) or JobConf#setJar(String).

So the results were incorrect.

Thanks,
Mo


On Wed, Jun 2, 2010 at 4:56 AM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:
> Hi,
> Depending on what hadoop version ( 0.18.3??? ) EC2 uses, you can try one =
of the following
>
> 1. Compile the streaming jar files with your own custom classes and run o=
n ec2 using this custom jar ( should work for 18.3 . Make sure you pick com=
patible streaming classes )
>
> 2. Jar up your classes and specify them as -libjars option on command lin=
e, and specify the custom input and output formats as you have on your loca=
l machine ( should work for >19.0 )
>
> I have never worked on EC2, so not sure if any easier solution exists.
>
>
> Amogh
>
>
> On 6/2/10 1:52 AM, "Mo Zhou" <mozhou@umail.iu.edu> wrote:
>
> Hi,
>
> I know it may not be suitable to be posted here since it relates to
> EC2 more than Hadoop. However I could not find a solution and hope
> some one here could kindly help me out. Here is my question.
>
> I created my own inputreader and outputformatter to split an input
> file while use hadoop streaming. They are tested in my local machine.
> Following is how I use them.
>
> bin/hadoop  jar hadoop-0.20.2-streaming.jar \
>   -D mapred.map.tasks=3D4\
>   -D mapred.reduce.tasks=3D0\
>   -input HumanSeqs.4\
>   -output output\
>   -mapper "./blastp -db nr -evalue 0.001 -outfmt 6"\
>   -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader"\
>   -inputformat "org.apache.hadoop.streaming.StreamFastaInputFormat"
>
>
> I want to deploy the job to elastic mapreduce. I first create a
> streaming job. I specify input and output in S3, mapper,
> and reducer. However I could not find the place where I can specify
> -inputreader and -inputformat.
>
> So my questions are
> 1) how I can upload the class files to be used as inputreader and
> inputformat to elastic mapreduce?
> 2) how I specify to use them in the streaming?
>
> Any reply is appreciated. Thanks for your time!
>
> --
> Thanks,
> Mo
>
>


--
Thanks,
Mo


--_000_C82C4EFDC1D1amoghyahooinccom_--