hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Runping Qi (JIRA)" <j...@apache.org>
Subject [jira] Updated: (HADOOP-1290) Move Hadoop Abacus to hadoop.mapred.lib
Date Wed, 25 Apr 2007 18:09:15 GMT

     [ https://issues.apache.org/jira/browse/HADOOP-1290?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Runping Qi updated HADOOP-1290:

    Attachment: patch-1284.txt

This patch implemented the proposed protocol.

With this patch, the streaming user can specify a field separatot for the mapper's output
and/or a field separator 
for the reducer's output. The default will be the tab char.

The user can also specify how many fields in the output consitute the keys. The default is
The rest part of a line will be the value.

A partitioner class, KeyFieldBasedPartitioner in mapred.lib, is also implemented. 
The user can specify the number of the fields in the map output keys 
will be used for partitioning.

Also a urility class, FieldSelectionMapReduce in mapred.lib, is added. This class allows the
user to create  map/reduce jobs that manapulate text data like the Unix cut utility.
The user can specify field separator (delimiter for cut) and specify which fields to select,
by which fields to partition/sort.

Two unit tests are introduced.
All the unit tests passed.

> Move Hadoop Abacus to hadoop.mapred.lib
> ---------------------------------------
>                 Key: HADOOP-1290
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1290
>             Project: Hadoop
>          Issue Type: Improvement
>            Reporter: Runping Qi
> Owen and I discussed this issue and we both felt that it is appropriate to move Hadoop
Abacus to the hadoop main framework.
> Any comments/thoughts/concerns/objections?

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message