flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jaromir Vanek <vanek.jaro...@gmail.com>
Subject Deterministic processing with out-of-order streams
Date Fri, 04 Nov 2016 22:38:40 GMT
Is Flink processing really repeatedly deterministic when incoming stream of
elements is out-of-order? How is it ensured?

I am aware of all the principles like event time and watermarking. But I
can't understand how it works in case there are late elements in stream -
that means there are elements violating the watermark condition - having
lower timestamp than previously emitted watermark. From my point of view
these elements will flow through the system without any mechanism that would
discard them. Late elements then may, or may not, fall into existing
windows. 

Let's draw simple example reading from Kafka source with two partitions.
Numbers are representing event time. Data from Kafka are shuffled to the one
WindowOperator calculating sum (all elements have the same key).

                                                
---------------------------------------|
part. 1   | ..., 15, 12, 9  |  ------->|                                              
|
                                                |          WindowOperator             
|
part. 2   | ..., 18, 6, 11  |  ------->|    (window maxTimestamp 10)    |
                                               
-----------------------------------------

Elements can arrive to WindowOperator in arbitrary order

example1 (E denotes element, W denotes watermark)

E 9
W 9
E 11 
W 11 (current watermark: min(9, 11) = 9)
E 6  
E 12
W 12 (current watermark: min(9, 12) = 12)
---------------
window(0, 10) fires with sum 15

example2:

E 9
W 9
E 11
W 11 (current watermark: min(9, 11) = 9) 
E 12
W 12 (current watermark: min(9, 12) = 12)
---------------
window fires with sum 9


In my example result is not deterministic, it's more less random. Is there
anything I am missing?

Thank you very much for explanation.



--
View this message in context: http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/Deterministic-processing-with-out-of-order-streams-tp14409.html
Sent from the Apache Flink Mailing List archive. mailing list archive at Nabble.com.

Mime
View raw message