beam-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Etienne Chauchot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (BEAM-4604) Support Triggers for "GroupIntoBatches" Transform
Date Thu, 21 Jun 2018 09:07:00 GMT

    [ https://issues.apache.org/jira/browse/BEAM-4604?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16519106#comment-16519106
] 

Etienne Chauchot commented on BEAM-4604:
----------------------------------------

Hi [~bushkovsky] GroupIntoBatches indeed uses state to record the batched elements (+ technical
data) and timers to detect the end of the window. 

There is 2 triggering conditions that trigger output: 
 # the watermark pass the end of the window+allowed lateness. In other words a timer is set
in event time to fire when it reaches the end of the window no matter the number of batched
elements. This to allow respecting event time triggering and not mess up windows.
 # The batch arrives at configured size inside the same window.

In your case the processing time trigger (1 sec delay after the first element is received)
might not trigger because the window is neither closed, nor there is batchSize elements in
the batch. If it did output data it would be a violation of the batch concept.

[~kenn]: Kenn, do you see anything that I missed that would require to change my implementation
of GroupIntoBatches?

 

 

> Support Triggers for "GroupIntoBatches" Transform
> -------------------------------------------------
>
>                 Key: BEAM-4604
>                 URL: https://issues.apache.org/jira/browse/BEAM-4604
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-core
>            Reporter: Oleksandr Bushkovskyi
>            Priority: Major
>
> I think it makes sense to implement triggering support for "GroupIntoBatches" transform.
> I've spent quite a long time trying to understand why my triggering behavior doesn't
work with "GroupIntoBatches".
> This transform has an exactly same signature and similar naming as "GroupByKey" transform.
It's confusing that this two similar from outer view transforms works differently with triggers.
> At least it should be clearly documented with "GroupIntoBatches" that it doesn't support
triggers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message