flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Álvaro Vilaplana García <alvaro.vilapl...@gmail.com>
Subject Re: Problem with ProcessFunction timeout feature
Date Fri, 23 Jun 2017 07:51:53 GMT
Hi Stefan,

Thank you so much for your answer.

Regarding the 'artificial events', our main problem is that we have no
control at all in the devices.

I have been reading more about event time and watermarks and what I
understood is that when we use event times (device times) Flink does not
know anything about notion of time and the watermark is a way to help Flink
to set the time of the stream (no more events with event time earlier than
the watermark). That would explain that we need always an event to set the
watermark. Does it make sense?

I understood that the watermarks will be per partition (ByKey(deviceId)),
is that right?


2017-06-22 16:26 GMT+01:00 Stefan Richter <s.richter@data-artisans.com>:

> Hi,
> if I understand correctly, your problem is that event time does not
> progress in case you don’t receive events, so you cannot detect the timeout
> of devices. Would it make sense to have you source periodically send
> artificial events to advance the watermark in the absence of device events,
> with a certain gap for which you can safely assume that you will no longer
> receive events with a smaller timestamp from any device in the future?
> Because, how else could Flink advance event time without receiving further
> events?
> Best,
> Stefan
> > Am 22.06.2017 um 16:35 schrieb Álvaro Vilaplana García <
> alvaro.vilaplana@gmail.com>:
> >
> > Hi,
> >
> > Please, can you help me with a problem? I summarise in the next points,
> I hope is enough clear to approach some help.
> >
> >
> > a) We have devices, each with its own ID, which we don’t have control of
> >
> > b) These devices send messages, with an internally generated, non-synced
> (amongst other devices) timestamp
> >
> > c) We want to detect when each devices may stop sending messages
> >
> > d) For that, we are using a ProcessFunction
> >
> > e) The devices put the messages in a Kafka topic, partitioned by ID.
> >
> > f) We are struggling with the ProcessFunction timeout feature:
> >
> > We cannot rely on real time (processing time), since the messages from
> the devices may be delayed (even if their timestamp does not show these
> delays) - so we rely on device timestamps instead.
> > In our case an event comes in which: "Registers a timer to be fired when
> the event time watermark passes the given time". The problem we have is
> there are cases where we do not get an additional event after the first
> event- which means that the original event timeouts are not triggered.
> >
> > As a side note we've seen in unit tests that flink seems to set a
> watermark after the last event with a Long.MaxValue (9223372036854775807) -
> which hides the above problem.
> >
> > I am using Scala 2.11 /Flink versions 1.2.0
> >
> > Regards
> > --
> > ______________________________
> >
> > Álvaro Vilaplana García


Álvaro Vilaplana García

View raw message