flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Akshay Shinde <akshay.shi...@oracle.com>
Subject Dedup all data in stream
Date Wed, 12 Feb 2020 02:34:29 GMT
Hi Community 

In our Flink job, in source we are creating our own stream to process n number of objects
per 2 minutes. And in process function for each object from generated source stream we are
doing some operation which we expect to get finished in 2 minutes.

Every 2 minutes we are generating same ’N’ objects in stream which process function will
process.  But in some cases process function is taking longer time around 10 minutes. In this
case stream will have 5 number of sets for ’N’ objects as process function is waiting
for 10 minutes as source is adding ’N’ objects in stream at every 2 minutes. Problem is
we don’t want to process these objects 5 times, we want it to process only once for the
latest ’N’ objects.   

This lag can be more or less from process function which results in lag from source to process
function in job execution.

Thanks in advance !!!
View raw message