hadoop-common-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amogh Vasekar <am...@yahoo-inc.com>
Subject Re: hadoop streaming on Amazon EC2
Date Thu, 03 Jun 2010 04:40:05 GMT
$ bin/hadoop jar <custom_streaming_jar> \
Should work.
Check $ bin/hadoop jar hadoop-0.18.3-streaming.jar -info for more details.


On 6/2/10 10:15 PM, "Mo Zhou" <mozhou@umail.iu.edu> wrote:

Thank you Amogh.

I tried so but it through exceptions as follows:

$ bin/hadoop  jar hadoop-0.18.3-streaming.jar \
>     -D stream.shipped.hadoopstreaming=fasta.jar\
>     -input HumanSeqs.4\
>     -output output\
>     -mapper "cat -"\
>     -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader,begin=>"\
>     -inputformat org.apache.hadoop.streaming.StreamFastaInputFormat
10/06/02 12:44:35 ERROR streaming.StreamJob: Unexpected -D while
processing -input|-output|-mapper|-combiner|-reducer|-file|-dfs|-jt|-additionalconfspec|-inputformat|-outputformat|-partitioner|-numReduceTasks|-inputreader|-mapdebug|-reducedebug|||-cacheFile|-cacheArchive|-verbose|-info|-debug|-inputtagged|-help
Usage: $HADOOP_HOME/bin/hadoop [--config dir] jar \
          $HADOOP_HOME/hadoop-streaming.jar [options]
  -input    <path>     DFS input file(s) for the Map step
  -output   <path>     DFS output directory for the Reduce step
  -mapper   <cmd|JavaClassName>      The streaming command to run
  -combiner <JavaClassName> Combiner has to be a Java class
  -reducer  <cmd|JavaClassName>      The streaming command to run
  -file     <file>     File/dir to be shipped in the Job jar file
  -dfs    <h:p>|local  Optional. Override DFS configuration
  -jt     <h:p>|local  Optional. Override JobTracker configuration
  -additionalconfspec specfile  Optional.
  -inputformat TextInputFormat(default)|SequenceFileAsTextInputFormat|JavaClassName
  -outputformat TextOutputFormat(default)|JavaClassName  Optional.
  -partitioner JavaClassName  Optional.
  -numReduceTasks <num>  Optional.
  -inputreader <spec>  Optional.
  -jobconf  <n>=<v>    Optional. Add or override a JobConf property
  -cmdenv   <n>=<v>    Optional. Pass env.var to streaming commands
  -mapdebug <path>  Optional. To run this script when a map task fails
  -reducedebug <path>  Optional. To run this script when a reduce task fails
  -cacheFile fileNameURI
  -cacheArchive fileNameURI

For more details about these options:
Use $HADOOP_HOME/bin/hadoop jar build/hadoop-streaming.jar -info

        at org.apache.hadoop.streaming.StreamJob.fail(StreamJob.java:550)
        at org.apache.hadoop.streaming.StreamJob.exitUsage(StreamJob.java:487)
        at org.apache.hadoop.streaming.StreamJob.parseArgv(StreamJob.java:209)
        at org.apache.hadoop.streaming.StreamJob.go(StreamJob.java:111)
        at org.apache.hadoop.streaming.HadoopStreaming.main(HadoopStreaming.java:33)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:155)
        at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

On Wed, Jun 2, 2010 at 8:40 AM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:
> Hi,
> You might need to add -Dstream.shipped.hadoopstreaming=<path_to_your_custom_streaming_jar>
> Amogh
> On 6/2/10 5:10 PM, "Mo Zhou" <mozhou@umail.iu.edu> wrote:
> Thank you Amogh. Elastic mapreduce use 0.18.3.
> I tried the first way by download hadoop-0.18.3 to my local machine.
> Then I got following warning.
> WARN mapred.JobClient: No job jar file set.  User classes may not be
> found. See JobConf(Class) or JobConf#setJar(String).
> So the results were incorrect.
> Thanks,
> Mo
> On Wed, Jun 2, 2010 at 4:56 AM, Amogh Vasekar <amogh@yahoo-inc.com> wrote:
>> Hi,
>> Depending on what hadoop version ( 0.18.3??? ) EC2 uses, you can try one of the following
>> 1. Compile the streaming jar files with your own custom classes and run on ec2 using
this custom jar ( should work for 18.3 . Make sure you pick compatible streaming classes )
>> 2. Jar up your classes and specify them as -libjars option on command line, and specify
the custom input and output formats as you have on your local machine ( should work for >19.0
>> I have never worked on EC2, so not sure if any easier solution exists.
>> Amogh
>> On 6/2/10 1:52 AM, "Mo Zhou" <mozhou@umail.iu.edu> wrote:
>> Hi,
>> I know it may not be suitable to be posted here since it relates to
>> EC2 more than Hadoop. However I could not find a solution and hope
>> some one here could kindly help me out. Here is my question.
>> I created my own inputreader and outputformatter to split an input
>> file while use hadoop streaming. They are tested in my local machine.
>> Following is how I use them.
>> bin/hadoop  jar hadoop-0.20.2-streaming.jar \
>>   -D mapred.map.tasks=4\
>>   -D mapred.reduce.tasks=0\
>>   -input HumanSeqs.4\
>>   -output output\
>>   -mapper "./blastp -db nr -evalue 0.001 -outfmt 6"\
>>   -inputreader "org.apache.hadoop.streaming.StreamFastaRecordReader"\
>>   -inputformat "org.apache.hadoop.streaming.StreamFastaInputFormat"
>> I want to deploy the job to elastic mapreduce. I first create a
>> streaming job. I specify input and output in S3, mapper,
>> and reducer. However I could not find the place where I can specify
>> -inputreader and -inputformat.
>> So my questions are
>> 1) how I can upload the class files to be used as inputreader and
>> inputformat to elastic mapreduce?
>> 2) how I specify to use them in the streaming?
>> Any reply is appreciated. Thanks for your time!
>> --
>> Thanks,
>> Mo
> --
> Thanks,
> Mo


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message