crunch-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Micah Whitacre (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (CRUNCH-192) Document and enforce the semantics around reducer-based Iterables
Date Mon, 08 Apr 2013 22:34:16 GMT

    [ https://issues.apache.org/jira/browse/CRUNCH-192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13625914#comment-13625914
] 

Micah Whitacre commented on CRUNCH-192:
---------------------------------------

Fair enough.  Logged CRUNCH-194.
                
> Document and enforce the semantics around reducer-based Iterables
> -----------------------------------------------------------------
>
>                 Key: CRUNCH-192
>                 URL: https://issues.apache.org/jira/browse/CRUNCH-192
>             Project: Crunch
>          Issue Type: Bug
>            Reporter: Gabriel Reid
>         Attachments: CRUNCH-192.patch
>
>
> As reported on user@crunch.apache.org by Chad Urso McDaniel:
> BLUF: The Iterable parameter to CombineFn.process implies you can iterate multiple times
when you cannot and this leads to surprising behavior.
> As many of you probably know, the signature of CombineFn.process is 
> ---
> process(Pair<K, Iterable<V>> input, Emitter<Pair<K, V>> emitter)
> ---
> The corresponding Hadoop Reducer signature is
> ---
> reduce(K2 key, Iterator<V2> values, OutputCollector<K3,V3> output, Reporter
reporter)
> ---
> I assume the Crunch use of Iterable is for convenient use in "for" loops.
> Unfortunately, the behavior of this Iterable seems to return the same Iterator object
each time Iterable.iterator() is called. 
> This makes sense to me based on the underlying hadoop mapreduce, but violates what I
think most expect from the Iterable interface.
> I understand that it's too late to change the interface, but could we at least have an
javadoc or an exception thrown if the Iterable is used more than once?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message