hadoop-common-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Klaas Bosteels (JIRA)" <j...@apache.org>
Subject [jira] Commented: (HADOOP-5979) Streaming partitioner should allow command, not just Java class
Date Tue, 09 Jun 2009 07:48:07 GMT

    [ https://issues.apache.org/jira/browse/HADOOP-5979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12717591#action_12717591
] 

Klaas Bosteels commented on HADOOP-5979:
----------------------------------------

Yeah, I was actually suggesting such a special Java implementation that writes to and reads
from a command, but instead of letting the command generate the partition number directly,
I thought it might make sense to let it output a key or even a key/value pair (which are completely
separate from the other MapReduce keys and values) and determine the partition from that.
So instead of generating the same number for pairs that need to go to the same reducer, the
partitioner command would just have to generate the same key for those pairs. The benefits
of such an approach would be that
# it's simpler (the partitioner command doesn't need to know how many partitions there are),
# it might be easier to define a suitable partitioner command (when using shell tools it might
be easier to output a string instead of a specific number for example),
# we could reuse more code that's already there (if we let the the partitioner command output
both a key and a value and pass that on to a wrapped partitioner, like in the code sample
I gave above, we even wouldn't need any additional reading/writing logic).

> Streaming partitioner should allow command, not just Java class
> ---------------------------------------------------------------
>
>                 Key: HADOOP-5979
>                 URL: https://issues.apache.org/jira/browse/HADOOP-5979
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: contrib/streaming
>            Reporter: Klaas Bosteels
>
> Since HADOOP-4842 got committed, Streaming allows both commands and Java classes to be
specified as mapper, reducer, and combiner, but the {{-partitioner}} option is still limited
to Java classes only. Allowing commands to be specified as partitioner as well would greatly
improve the flexibility of Streaming programs.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message