flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (FLINK-1267) Add crossGroup operator
Date Thu, 14 Jan 2016 20:31:39 GMT

     [ https://issues.apache.org/jira/browse/FLINK-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Fabian Hueske updated FLINK-1267:
    Component/s:     (was: Java API)
                     (was: Scala API)
                 DataSet API

> Add crossGroup operator
> -----------------------
>                 Key: FLINK-1267
>                 URL: https://issues.apache.org/jira/browse/FLINK-1267
>             Project: Flink
>          Issue Type: New Feature
>          Components: DataSet API, Local Runtime, Optimizer
>    Affects Versions: 0.7.0-incubating
>            Reporter: Fabian Hueske
>            Assignee: pietro pinoli
>            Priority: Minor
> A common operator is to pair-wise compare or combine all elements of a group (there were
two questions about this on the user mailing list, recently). Right now, this can be done
in two ways:
> 1. {{groupReduce}}: consume and store the complete iterator in memory and build all pairs
> 2. do a self-{{Join}}: the engine builds all pairs of the full symmetric Cartesian product.
> Both approaches have drawbacks. The {{groupReduce}} variant requires that the full group
fits into memory and is more cumbersome to implement for the user, but pairs can be arbitrarily
built. The self-{{Join}} approach pushes most of the work into the system, but the execution
strategy does not treat the self-join different from a regular join (both identical inputs
are shuffled, etc.) and always builds the full symmetric Cartesian product.
> I propose to add a dedicated {{crossGroup()}} operator, that offers this functionality
in a proper way.

This message was sent by Atlassian JIRA

View raw message