hadoop-mapreduce-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dieter Plaetinck (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (MAPREDUCE-2410) document multiple keys per reducer oddity in hadoop streaming FAQ
Date Wed, 11 May 2011 13:39:47 GMT

    [ https://issues.apache.org/jira/browse/MAPREDUCE-2410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13031694#comment-13031694
] 

Dieter Plaetinck commented on MAPREDUCE-2410:
---------------------------------------------

I think that's very well and concisely explained, but to make it really clear to beginners
I would add after the last line:
"A practical consequence of this is that reducers for streaming need to be able to deal with
different input keys"

Or even: 
"A practical consequence of this is that reducers for streaming need to be able to deal with
different input keys, although some projects exist to provide a similar abstract API on top
of the streaming API, such as dumbo for python programmers [*]"

[*] https://github.com/klbostee/dumbo/wiki/Short-tutorial


> document multiple keys per reducer oddity in hadoop streaming FAQ
> -----------------------------------------------------------------
>
>                 Key: MAPREDUCE-2410
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-2410
>             Project: Hadoop Map/Reduce
>          Issue Type: Improvement
>          Components: contrib/streaming, documentation
>    Affects Versions: 0.20.2
>            Reporter: Dieter Plaetinck
>            Assignee: Harsh J Chouraria
>            Priority: Minor
>              Labels: newbie
>             Fix For: 0.23.0
>
>         Attachments: MAPREDUCE-2410.r1.diff
>
>   Original Estimate: 40m
>  Remaining Estimate: 40m
>
> Hi,
> for a newcomer to hadoop streaming, it comes as a surprise that the reducer receives
arbitrary keys, unlike the "real" hadoop where a reducer works on a single key.
> An explanation for this is @ http://mail-archives.apache.org/mod_mbox/hadoop-common-user/201103.mbox/browser
> I suggest to add this to the FAQ of hadoop streaming

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message