beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jingsong Lee (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-2477) BeamAggregationRel should use Combine.perKey instead of GroupByKey
Date Tue, 20 Jun 2017 09:10:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-2477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16055416#comment-16055416
] 

Jingsong Lee commented on BEAM-2477:
------------------------------------

*Local combine*: Cloud Dataflow/Flink Batch optimizes Combine operations (such as Count and
Sum) by performing partial combining locally before sending the data to the main grouping
operation. Graph optimizations in https://cloud.google.com/blog/big-data/2017/05/after-lambda-exactly-once-processing-in-cloud-dataflow-part-2-ensuring-low-latency
*Incremental aggregation*: Similar to Flink's concept, https://ci.apache.org/projects/flink/flink-docs-release-1.3/dev/windows.html#windowfunction-with-incremental-aggregation

While the GroupByKey will keep the details of elements until the window closes. (AFAIK in
Flink Runner)

> BeamAggregationRel should use Combine.perKey instead of GroupByKey
> ------------------------------------------------------------------
>
>                 Key: BEAM-2477
>                 URL: https://issues.apache.org/jira/browse/BEAM-2477
>             Project: Beam
>          Issue Type: Improvement
>          Components: dsl-sql
>            Reporter: Jingsong Lee
>            Assignee: Jingsong Lee
>              Labels: dsl_sql_merge
>
> Their semantics are the same, but the efficiency of implementation is quite different,
and at the runner level there is a lot of optimization for `Combine.perKey`.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message