flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Greg Hogan (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-3279) Optionally disable DistinctOperator combiner
Date Fri, 15 Jul 2016 16:01:21 GMT

    [ https://issues.apache.org/jira/browse/FLINK-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15379606#comment-15379606

Greg Hogan commented on FLINK-3279:

I fixed the wording of my comment. I think Fabian's suggestion was to investigate changing
{{DistinctOperator}} from using {{GroupReduce}} to using {{Reduce}}. Then we could add {{setCombineHint}}
to {{DistinctOperator}} rather than my suggestion above.

> Optionally disable DistinctOperator combiner
> --------------------------------------------
>                 Key: FLINK-3279
>                 URL: https://issues.apache.org/jira/browse/FLINK-3279
>             Project: Flink
>          Issue Type: New Feature
>          Components: DataSet API
>    Affects Versions: 1.0.0
>            Reporter: Greg Hogan
>            Assignee: Greg Hogan
>            Priority: Minor
> Calling {{DataSet.distinct()}} executes {{DistinctOperator.DistinctFunction}} which is
a combinable {{RichGroupReduceFunction}}. Sometimes we know that there will be few duplicate
records and disabling the combine would improve performance.
> I propose adding {{public DistinctOperator<T> setCombinable(boolean combinable)}}
to {{DistinctOperator}}.

This message was sent by Atlassian JIRA

View raw message