flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephan Ewen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-1267) Add crossGroup operator
Date Fri, 21 Nov 2014 11:05:33 GMT

    [ https://issues.apache.org/jira/browse/FLINK-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14220775#comment-14220775

Stephan Ewen commented on FLINK-1267:

I think it may be useful, but is not really a urgent at this point.

The reason is that the groupReduce variant should be fine in virtually all cases. If you have
so many elements that the groupReduce variant runs out of memory, you probably will not be
able to compute the cross product of that group with itself in any reasonable time anyways...

> Add crossGroup operator
> -----------------------
>                 Key: FLINK-1267
>                 URL: https://issues.apache.org/jira/browse/FLINK-1267
>             Project: Flink
>          Issue Type: New Feature
>          Components: Java API, Local Runtime, Optimizer, Scala API
>    Affects Versions: 0.7.0-incubating
>            Reporter: Fabian Hueske
>            Priority: Minor
> A common operator is to pair-wise compare or combine all elements of a group (there were
two questions about this on the user mailing list, recently). Right now, this can be done
in two ways:
> 1. {{groupReduce}}: consume and store the complete iterator in memory and build all pairs
> 2. do a self-{{Join}}: the engine builds all pairs of the full symmetric Cartesian product.
> Both approaches have drawbacks. The {{groupReduce}} variant requires that the full group
fits into memory and is more cumbersome to implement for the user, but pairs can be arbitrarily
built. The self-{{Join}} approach pushes most of the work into the system, but the execution
strategy does not treat the self-join different from a regular join (both identical inputs
are shuffled, etc.) and always builds the full symmetric Cartesian product.
> I propose to add a dedicated {{crossGroup()}} operator, that offers this functionality
in a proper way.

This message was sent by Atlassian JIRA

View raw message