hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Klaas Bosteels (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5979) Streaming partitioner should allow command, not just Java class
Date Fri, 05 Jun 2009 12:26:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716599#action_12716599

Klaas Bosteels commented on HADOOP-5979:

I haven't thought much about the details yet, but the easiest way to implement it might be
to add a {{PipePartitioner}} that extends {{PipeMapper}} yes, much like {{PipeCombiner}} is
an extension of {{PipeReducer}}. The {{PipePartitioner}} would have to implement {{Partitioner}},
however, so it would also have to add an {{int getPartition(Object key, Object value, int
numPartitions)}} method, which could work somewhat similarly to the {{void map(...)}} method.
The way I see it, this method would use {{inWriter_}} to write the key and value to the standard
input of the partitioner command and then rely on {{outReader_}} to read the key and value
returned for this pair and supply them to the {{int getPartition(...)}} method of a wrapped
partitioner, i.e., simplified it could look something like:

public int getPartition(K2 key, V2 value, int numPartitions) {
  if (!ignoreKey) {
  if (!outReader_.readKeyValue()) {
    throw RuntimeException("partioner must output one key/val pair for each input pair");
  Object newKey = outReader_.getCurrentKey();
  Object newValue = outReader_.getCurrentValue();
  return realPartitioner.getPartition(newKey, newValue, numPartitions);

Streaming users could then easily define partitioners by specifying a partitioner command
that transforms key/value pairs in such a way that the wrapped partitioner shows the desired
behavior. The default wrapped partitioner should probably be {{HashPartitioner}}. 

Does this make sense to you, Devaraj?

> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>                 Key: HADOOP-5979
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5979
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Klaas Bosteels
> Since HADOOP-4842 got committed, Streaming allows both commands and Java classes to be
specified as mapper, reducer, and combiner, but the {{-partitioner}} option is still limited
to Java classes only. Allowing commands to be specified as partitioner as well would greatly
improve the flexibility of Streaming programs.

This message is automatically generated by JIRA.
You can reply to this email to add a comment to the issue online.

View raw message