Return-Path: Delivered-To: apmail-hadoop-common-user-archive@www.apache.org Received: (qmail 39543 invoked from network); 2 Jun 2010 12:41:18 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 2 Jun 2010 12:41:18 -0000 Received: (qmail 43799 invoked by uid 500); 2 Jun 2010 12:41:16 -0000 Delivered-To: apmail-hadoop-common-user-archive@hadoop.apache.org Received: (qmail 43736 invoked by uid 500); 2 Jun 2010 12:41:16 -0000 Mailing-List: contact common-user-help@hadoop.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: common-user@hadoop.apache.org Delivered-To: mailing list common-user@hadoop.apache.org Received: (qmail 43728 invoked by uid 99); 2 Jun 2010 12:41:16 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jun 2010 12:41:16 +0000 X-ASF-Spam-Status: No, hits=3.3 required=10.0 tests=HTML_MESSAGE,NO_RDNS_DOTCOM_HELO,RCVD_IN_DNSWL_NONE,SPF_NEUTRAL X-Spam-Check-By: apache.org Received-SPF: neutral (nike.apache.org: local policy) Received: from [216.145.54.171] (HELO mrout1.yahoo.com) (216.145.54.171) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 02 Jun 2010 12:41:08 +0000 Received: from EGL-EX07CAS02.ds.corp.yahoo.com (egl-ex07cas02.eglbp.corp.yahoo.com [203.83.248.209]) by mrout1.yahoo.com (8.13.8/8.13.8/y.out) with ESMTP id o52Ce7lJ080714 for ; Wed, 2 Jun 2010 05:40:07 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; s=serpent; d=yahoo-inc.com; c=nofws; q=dns; h=received:from:to:date:subject:thread-topic:thread-index: message-id:in-reply-to:accept-language:content-language: x-ms-has-attach:x-ms-tnef-correlator:acceptlanguage:content-type:mime-version; b=gyknsvrMGU//HxzJ5r/3RK07Nu7h0IX07MGPTSeKJFBgFm29yH3QMLuOOKLH7p6B Received: from EGL-EX07VS01.ds.corp.yahoo.com ([203.83.248.205]) by EGL-EX07CAS02.ds.corp.yahoo.com ([203.83.248.216]) with mapi; Wed, 2 Jun 2010 18:10:06 +0530 From: Amogh Vasekar To: "common-user@hadoop.apache.org" Date: Wed, 2 Jun 2010 18:10:05 +0530 Subject: Re: hadoop streaming on Amazon EC2 Thread-Topic: hadoop streaming on Amazon EC2 Thread-Index: AcsCSJAsg3M1I72CTQ2yyQnvljW49QACCm9T Message-ID: In-Reply-To: Accept-Language: en-US Content-Language: en X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: multipart/alternative; boundary="_000_C82C4EFDC1D1amoghyahooinccom_" MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org --_000_C82C4EFDC1D1amoghyahooinccom_ Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Hi, You might need to add -Dstream.shipped.hadoopstreaming=3D Amogh On 6/2/10 5:10 PM, "Mo Zhou" wrote: Thank you Amogh. Elastic mapreduce use 0.18.3. I tried the first way by download hadoop-0.18.3 to my local machine. Then I got following warning. WARN mapred.JobClient: No job jar file set. User classes may not be found. See JobConf(Class) or JobConf#setJar(String). So the results were incorrect. Thanks, Mo On Wed, Jun 2, 2010 at 4:56 AM, Amogh Vasekar wrote: > Hi, > Depending on what hadoop version ( 0.18.3??? ) EC2 uses, you can try one = of the following > > 1. Compile the streaming jar files with your own custom classes and run o= n ec2 using this custom jar ( should work for 18.3 . Make sure you pick com= patible streaming classes ) > > 2. Jar up your classes and specify them as -libjars option on command lin= e, and specify the custom input and output formats as you have on your loca= l machine ( should work for >19.0 ) > > I have never worked on EC2, so not sure if any easier solution exists. > > > Amogh > > > On 6/2/10 1:52 AM, "Mo Zhou" wrote: > > Hi, > > I know it may not be suitable to be posted here since it relates to > EC2 more than Hadoop. However I could not find a solution and hope > some one here could kindly help me out. Here is my question. > > I created my own inputreader and outputformatter to split an input > file while use hadoop streaming. They are tested in my local machine. > Following is how I use them. > > bin/hadoop jar hadoop-0.20.2-streaming.jar \ > -D mapred.map.tasks=3D4\ > -D mapred.reduce.tasks=3D0\ > -input HumanSeqs.4\ > -output output\ > -mapper "./blastp -db nr -evalue 0.001 -outfmt 6"\ > -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader"\ > -inputformat "org.apache.hadoop.streaming.StreamFastaInputFormat" > > > I want to deploy the job to elastic mapreduce. I first create a > streaming job. I specify input and output in S3, mapper, > and reducer. However I could not find the place where I can specify > -inputreader and -inputformat. > > So my questions are > 1) how I can upload the class files to be used as inputreader and > inputformat to elastic mapreduce? > 2) how I specify to use them in the streaming? > > Any reply is appreciated. Thanks for your time! > > -- > Thanks, > Mo > > -- Thanks, Mo --_000_C82C4EFDC1D1amoghyahooinccom_--