flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fabian Hueske (JIRA)" <j...@apache.org>
Subject [jira] [Created] (FLINK-5722) Implement DISTINCT as dedicated operator
Date Mon, 06 Feb 2017 13:55:41 GMT
Fabian Hueske created FLINK-5722:

             Summary: Implement DISTINCT as dedicated operator
                 Key: FLINK-5722
                 URL: https://issues.apache.org/jira/browse/FLINK-5722
             Project: Flink
          Issue Type: Improvement
          Components: Table API & SQL
    Affects Versions: 1.2.0, 1.3.0
            Reporter: Fabian Hueske

DISTINCT is currently implemented for batch Table API / SQL as an aggregate which groups on
all fields. Grouped aggregates are implemented as GroupReduce with sort-based combiner.

This operator can be more efficiently implemented by using ReduceFunction and hinting a HashCombine
strategy. The same ReduceFunction can be used for all DISTINCT operations and can be assigned
with appropriate forward field annotations.

We would need a custom conversion rule which translates distinct aggregations (grouping on
all fields and returning all fields) into a custom DataSetRelNode.

This message was sent by Atlassian JIRA

View raw message