flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ning Shi <nings...@gmail.com>
Subject Re: Low Performance in High Cardinality Big Window Application
Date Mon, 27 Aug 2018 14:14:23 GMT
> If you have a window larger than hours then you need to rethink your architecture - this
is not streaming anymore. Only because you receive events in a streamed fashion you don’t
need to do all the processing in a streamed fashion.

Thanks for the thoughts, I’ll keep that in mind. However, in the test, it was not storing
more than two days worth of data yet. I’m very much interested in understanding the root
cause of the low performance before moving on to do major restructuring.

> Can you store the events in a file or a database and then do after 30 days batch processing
on them?

The 30 day window is just used for deduplication, but it triggers for every event and sends
the result out to downstream so that we can still get real-time analytics on the events.

> Another aspect could be also to investigate why your source sends duplicated entries.

They are not 100% duplicate events syntactically. The events are only duplicates from a logical
sense. For example, the same person doing the same action multiple times at different time
of day.

View raw message