flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Aljoscha Krettek <aljos...@apache.org>
Subject Re: Issues with Event Time and Kafka
Date Mon, 13 Mar 2017 15:24:04 GMT
What Robert said is correct. However, that behaviour depends on the
Trigger. You can write your own Trigger that behaves differently when
late data arrives, that is, you could write a trigger that never fires
for late data. In that case, you can also simply set the allowed
lateness to zero, however. You could also write a trigger that waits for
a certain number of late elements to arrive and then triggers a firing.


Best,

Aljoscha





On Fri, Mar 10, 2017, at 20:14, Robert Metzger wrote:

> Hi Ethan,

> 

> how late elements (elements with event time after the watermark) are
> handled depends on the operator. Flink's window operators will trigger
> a single event window when they fall into the "allowed lateness"
> timeframe. Otherwise, they are dropped.
> 

> On Thu, Mar 9, 2017 at 5:30 PM, ext.eformichella
> <ext.eformichella@riotgames.com> wrote:
>> Thanks for the suggestion, we can definitely try that out.

>> 

>> My one concern there is that events technically can lag for days or
>> even months in some cases, but we only care about including the
>> events that lag for 30 minutes or so, and would like the further
>> lagging events to be ignored - I just want to make sure that doesn't
>> require special handling.
>> 

>> I also just want to make sure I'm understanding the maximum lateness
>> watermark correctly. Suppose a watermark gets generated, and then an
>> element with an older timestamp is found. My understanding was that
>> that element should be ignored, but from our results it looks like
>> the late element actually overwrites the aggregate of the on-time
>> elements. Is this expected behavior?
>> 

>> Thank you for your help!

>> -Ethan

>> 

>> On Tue, Mar 7, 2017 at 6:01 PM, Dawid Wysakowicz [via Apache Flink
>> User Mailing List archive.] <[hidden email][1]> wrote:
>>> 

>>> Hi Ethan,

>>> 

>>> I believe then it is because the Watermark and Timestamps in your
>>> implementation are uncorrelated. What Watermark really is a marker
>>> that says there will be no elements with timestamp smaller than the
>>> value of this watermark. For more info on the concept see [1][2].
>>> 

>>> In your case as you say that events can "lag" for 30 minutes, you
>>> should try the BoundedOutOfOrdernessTimestampExtractor. It is
>>> designed exactly for a case like yours.
>>> 

>>> Regards,

>>> Dawid

>>> 

>>> 

>>> 2017-03-07 22:33 GMT+01:00 ext.eformichella <[hidden email][3]>:

>>>> Hi Dawid, I'm working with Max on the project Our code for the
>>>> TimestampAndWatermarkAssigner is: ``` class
>>>> TimestampAndWatermarkAssigner(val maxLateness: Long) extends
>>>> AssignerWithPeriodicWatermarks[Row] {
>>>>
>>>>    override def extractTimestamp(element: Row,
>>>>    previousElementTimestamp: Long): Long = {  element.minTime }
>>>>
>>>>    override def getCurrentWatermark(): Watermark = {  new
>>>>    Watermark(System.currentTimeMillis() - maxLateness) } } ```
>>>>
>>>>  Where Row is a class representing the incoming JSON object coming
>>>>  from Kafka, which includes the timestamp
>>>>
>>>>  Thanks, -Ethan
>>>>
>>>>
>>>>
>>>>  --
>>>>  View this message in context:
>>>>  http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Issues-with-Event-Time-and-Kafka-tp12061p12090.html
>>>> Sent from the Apache Flink User Mailing List archive. mailing list
>>>> archive at Nabble.com.
>>>> 

>>> 

>>> 
>>> 

>>> 

>>> If you reply to this email, your message will be added to the
>>> discussion below:
>>> http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Issues-with-Event-Time-and-Kafka-tp12061p12092.html
>>> 

>>> To unsubscribe from Issues with Event Time and Kafka, click here.
>>> NAML[4]
>>> 

>>
>> View this message in context:Re: Issues with Event Time and Kafka[5]
>> Sent from the Apache Flink User Mailing List archive. mailing list
>> archive[6] at Nabble.com.



Links:

  1. http:///user/SendEmail.jtp?type=node&node=12139&i=0
  2. https://ci.apache.org/projects/flink/flink-docs-release-1.2/dev/event_time.html#event-time-and-watermarks
  3. http:///user/SendEmail.jtp?type=node&node=12092&i=0
  4. http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml
  5. http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/Issues-with-Event-Time-and-Kafka-tp12061p12139.html
  6. http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/

Mime
View raw message