hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Milind Bhandarkar (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5979) Streaming partitioner should allow command, not just Java class
Date Tue, 09 Jun 2009 20:17:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717809#action_12717809

Milind Bhandarkar commented on HADOOP-5979:

It would be valuable to have a non-java streaming partitioner. It should be executed once
per map task, and should take as input (through stdin), the text-encoded key value pairs (one
per line, separated by field separator), and output on stdout a number (again, text-encoded)
for each key value pair.

Number of partitions, i.e. number of reducers should already be available to this streaming
partitioner as the environment variable mapred_reduce_tasks. So, no need to pass it in each

Partitioner need not be an "advanced" feature. Think about a parallel bucketing operation,
where number of buckets is predetermined, so the mapper makes a decision where each value
should go. In this case, the key is a partition ID, and value is the record to be bucketed.
HashBased partitioner of course does not work in this case. But a streaming partitioner, such
as 'cut -f1' is what is needed.

> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>                 Key: HADOOP-5979
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5979
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Klaas Bosteels
> Since HADOOP-4842 got committed, Streaming allows both commands and Java classes to be
specified as mapper, reducer, and combiner, but the {{-partitioner}} option is still limited
to Java classes only. Allowing commands to be specified as partitioner as well would greatly
improve the flexibility of Streaming programs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message