flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Anwar Rizal <anriza...@gmail.com>
Subject Re: Where does Flink store intermediate state and triggering/scheduling a delayed event?
Date Wed, 03 Feb 2016 10:17:53 GMT
Allow me to jump to this very interesting discussion.

The 2nd point is actually an interesting question.

I understand that we can set a timestamp of event in Flink. What if we set
the timestamp to somewhere in the future, for example 24 hours from now ?
Can Flink handle this case ?


Also , I'm still unclear whether the windowing can also be backed up by a
backend like RocksDB. Concretely, can we have a time window of 24 hours
while the tps is 100 TPS ?

Anwar.

On Wed, Feb 3, 2016 at 10:12 AM, Fabian Hueske <fhueske@gmail.com> wrote:

> Hi,
>
> 1) At the moment, state is kept on the JVM heap in a regular HashMap.
>
> However, we added an interface for pluggable state backends. State
> backends store the operator state (Flink's built-in window operators are
> based on operator state as well). A pull request to add a RocksDB backend
> (going to disk) will be merged soon [1]. Another backend using Flink's
> managed memory is planned.
>
> 2) I am not sure what you mean by trigger / schedule a delayed event, but
> have a few pointers that might be helpful:
> - Flink can handle late arriving events. Check the event-time feature [2].
> - Flink's window triggers can be used to schedule window computations [3]
> - You can implement a custom source function that emits / triggers events.
>
> Best, Fabian
>
> [1] https://github.com/apache/flink/pull/1562
> [2]
> http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
> [3] http://flink.apache.org/news/2015/12/04/Introducing-windows.html
>
> 2016-02-03 5:39 GMT+01:00 Soumya Simanta <soumya.simanta@gmail.com>:
>
>> I'm getting started with Flink and had a very fundamental doubt.
>>
>> 1) Where does Flink capture/store intermediate state?
>>
>> For example, two streams of data have a common key. The streams can lag
>> in time (second, hours or even days). My understanding is that Flink
>> somehow needs to store the data from the first (faster) stream so that it
>> can match and join the data with the second(slower) stream.
>>
>> 2) Is there a mechanism to trigger/schedule a delayed event in Flink?
>>
>> Thanks
>> -Soumya
>>
>>
>>
>>
>

Mime
View raw message