hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Michel Tourn (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-191) add hadoopStreaming to src/contrib
Date Tue, 02 May 2006 22:18:47 GMT
    [ http://issues.apache.org/jira/browse/HADOOP-191?page=comments#action_12377480 ] 

Michel Tourn commented on HADOOP-191:

The usage message:


Usage: hadoopStreaming [options]
  -input   <path>     DFS input file(s) for the Map step
  -output  <path>     DFS output directory for the Reduce step
  -mapper  <cmd>      The streaming command to run
  -reducer <cmd>      The streaming command to run
  -files   <file>     Additional files to be shipped in the Job jar file
  -cluster <name>     Default uses hadoop-default.xml and hadoop-site.xml
  -config  <file>     Optional. One or more paths to xml config files
  -inputreader <spec> Optional. See below

In -input: globbing on <path> is supported and can have multiple -input
Default Map input format: a line is a record in UTF-8
  the key part ends at first TAB, the rest of the line is the value
Custom Map input format: -inputreader package.MyRecordReader,n=v,n=v
  comma-separated name-values can be specified to configure the InputFormat
  Ex: -inputreader 'StreamXmlRecordReader,begin=<doc>,end=</doc>'
Map output format, reduce input/output format:
  Format defined by what mapper command outputs. Line-oriented
Mapper and Reducer <cmd> syntax:
  If the mapper or reducer programs are prefixed with noship: then
  the paths are assumed to be valid absolute paths on the task tracker machines
  and are NOT packaged with the Job jar file.
Use -cluster <name> to switch between "local" Hadoop and one or more remote
  Hadoop clusters.
  The default is to use the normal hadoop-default.xml and hadoop-site.xml
  Else configuration will use $HADOOP_HOME/conf/hadoop-<name>.xml

Example: hadoopStreaming -mapper "noship:/usr/local/bin/perl5 filter.pl"
           -files /local/filter.pl -input "/logs/0604*/*" [...]
  Ships a script, invokes the non-shipped perl interpreter
  Shipped files go to the working directory so filter.pl is found by perl
  Input files are all the daily logs for days in month 2006-04

> add hadoopStreaming to src/contrib
> ----------------------------------
>          Key: HADOOP-191
>          URL: http://issues.apache.org/jira/browse/HADOOP-191
>      Project: Hadoop
>         Type: New Feature

>     Reporter: Michel Tourn
>     Assignee: Doug Cutting
>  Attachments: streaming.patch
> This is a patch that adds a src/contrib/hadoopStreaming directory to the source tree.
> hadoopStreaming is a bridge to run non-Java code as Map/Reduce tasks.
> The unit test TestStreaming runs the Unix tools tr (as Map) and uniq (as Reduce)
> TO test the patch: 
> Merge the patch. 
> The only existing file that is modified is trunk/build.xml
> trunk>ant deploy-contrib
> trunk>bin/hadoopStreaming : should show usage message
> trunk>ant test-contrib    : should run one test successfully
> TO add src/contrib/someOtherProject:
> edit src/contrib/build.xml

This message is automatically generated by JIRA.
If you think it was sent incorrectly contact one of the administrators:
For more information on JIRA, see:

View raw message