flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yassine MARZOUGUI <y.marzou...@mindlytix.com>
Subject Re: Appropriate State to use to buffer events in ProcessFunction
Date Thu, 09 Mar 2017 15:19:19 GMT
Hi Timo,

I thought about the ListState but quickly discarded It as it keeps the
insersion order and not events order. After a second thought I think I will
reconsider it since my events are occaionally out-of-order. Didn't know
that Flink CEP operators 'next' and 'within', can handle event time, so I
think I will give it a try! Thank you!

Best,
Yassine

2017-03-08 9:55 GMT+01:00 Timo Walther <twalthr@apache.org>:

> Hi Yassine,
>
> have you thought about using a ListState? As far as I know, it keeps at
> least the insertion order. You could sort it once your trigger event has
> arrived.
> If you use a RocksDB as state backend, 100+ GB of state should not be a
> problem. Have you thought about using Flink's CEP library? It might fit to
> your needs without implementing a custom process function.
>
> I hope that helps.
>
> Timo
>
>
> Am 07/03/17 um 19:23 schrieb Yassine MARZOUGUI:
>
> Hi all,
>>
>> I want to label events in a stream based on a condition on some future
>> events.
>> For example my stream contains events of type A and B and and I would
>> like to assign a label 1 to an event E of type A if an event of type B
>> happens within a duration x of E. I am using event time and my events can
>> be out of order.
>> For this I'm using ProcessFunction which looks suitable for my use case.
>> In order to handle out of order events, I'm keeping events of type A in a
>> state and once an event of type B is received, I fire an event time timer
>> in which I loop through events of type A in the state having a timestamps <
>> timer.timestamp, label them and remove them from the state.
>> Currently the state is simply a value state containing a
>> TreeMap<Timestamp, EventA>. I'm keeping events sorted in order to
>> effectively get events older than the timer timestamp.
>> I wonder If that's the appropriate data structure to use in the value
>> state to buffer events and be able to handle out of orderness, or if there
>> is a more effective implementation, especially that the state may grow to
>> reach ~100 GB sometimes?
>>
>> Any insight is appreciated.
>>
>> Thanks,
>> Yassine
>>
>>
>>
>>
>

Mime
View raw message