flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-6969) Add support for deferred computation for group window aggregates
Date Wed, 28 Jun 2017 10:32:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-6969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16066306#comment-16066306
] 

ASF GitHub Bot commented on FLINK-6969:
---------------------------------------

Github user wuchong commented on the issue:

    https://github.com/apache/flink/pull/4183
  
    Hi @fhueske , jincheng and me have discussed offline and have an unified opinion.
    
    We think the approach of window(with custom trigger) + custom operator will increase the
latency when multiple window aggregate is applied. For example, defer 1 hour to compute with
two level window aggregates. The output and watermark of the first level window have been
deferred 1 hour. In fact, the output is in order now. However, the second window will still
defer 1 hour to compute which result in the result is delayed for two hours. 
    
    We think we only need to hold the watermark back at source, then all the downstream window
operators will defer the given offset to compute. And the end-to-end latency is the given
offset.
    
    Therefor, we propose to add a custom operator (to offset the watermark) after the source
(the DataStream in the physical plan tree root). And no custom trigger needed.
    
    What do you think? 
    
    
    



> Add support for deferred computation for group window aggregates
> ----------------------------------------------------------------
>
>                 Key: FLINK-6969
>                 URL: https://issues.apache.org/jira/browse/FLINK-6969
>             Project: Flink
>          Issue Type: New Feature
>          Components: Table API & SQL
>            Reporter: Fabian Hueske
>            Assignee: sunjincheng
>
> Deferred computation is a strategy to deal with late arriving data and avoid updates
of previous results. Instead of computing a result as soon as it is possible (i.e., when a
corresponding watermark was received), deferred computation adds a configurable amount of
slack time in which late data is accepted before the result is compute. For example, instead
of computing a tumbling window of 1 hour at each full hour, we can add a deferred computation
interval of 15 minute to compute the result quarter past each full hour.
> This approach adds latency but can reduce the number of update esp. in use cases where
the user cannot influence the generation of watermarks. It is also useful if the data is emitted
to a system that cannot update result (files or Kafka). The deferred computation interval
should be configured via the {{QueryConfig}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message