hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Klaas Bosteels (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5979) Streaming partitioner should allow command, not just Java class
Date Fri, 05 Jun 2009 12:26:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12716599#action_12716599
] 

Klaas Bosteels commented on HADOOP-5979:
----------------------------------------

I haven't thought much about the details yet, but the easiest way to implement it might be
to add a {{PipePartitioner}} that extends {{PipeMapper}} yes, much like {{PipeCombiner}} is
an extension of {{PipeReducer}}. The {{PipePartitioner}} would have to implement {{Partitioner}},
however, so it would also have to add an {{int getPartition(Object key, Object value, int
numPartitions)}} method, which could work somewhat similarly to the {{void map(...)}} method.
The way I see it, this method would use {{inWriter_}} to write the key and value to the standard
input of the partitioner command and then rely on {{outReader_}} to read the key and value
returned for this pair and supply them to the {{int getPartition(...)}} method of a wrapped
partitioner, i.e., simplified it could look something like:

{code}
public int getPartition(K2 key, V2 value, int numPartitions) {
  if (!ignoreKey) {
    inWriter_.writeKey(key);
  }
  inWriter_.writeValue(value);
  if (!outReader_.readKeyValue()) {
    throw RuntimeException("partioner must output one key/val pair for each input pair");
  }
  Object newKey = outReader_.getCurrentKey();
  Object newValue = outReader_.getCurrentValue();
  return realPartitioner.getPartition(newKey, newValue, numPartitions);
}
{code}

Streaming users could then easily define partitioners by specifying a partitioner command
that transforms key/value pairs in such a way that the wrapped partitioner shows the desired
behavior. The default wrapped partitioner should probably be {{HashPartitioner}}. 

Does this make sense to you, Devaraj?

> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5979
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5979
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Klaas Bosteels
>
> Since HADOOP-4842 got committed, Streaming allows both commands and Java classes to be
specified as mapper, reducer, and combiner, but the {{-partitioner}} option is still limited
to Java classes only. Allowing commands to be specified as partitioner as well would greatly
improve the flexibility of Streaming programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message