flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chen Qin <qinnc...@gmail.com>
Subject Re: Late arriving events
Date Wed, 06 Jul 2016 23:17:21 GMT

Sorry for late reply, some of my thoughts inline.


> Another way to do this is to kick off a parallel job to do the backfill
> from the previous savepoint without stopping the current "realtime" job.
> This way you would not have to have a "blackout".  This assumes your final
> sink can tolerate having some parallel writes to it OR you have two
> different sinks and throw a switch from one to another for downstream jobs,
> etc.

​Sounds great to me. I think it will solve "blackout" issue I mentioned.
Sink might a bit more like read-check-write fashion but should be fine.

>> In general, I don't know if there are good solutions for all of these
>> scenarios. Some might keep messages in windows longer.(messages not purged
>> yet) Some might kick off another pipeline just dealing with affected
>> windows(messages already purged). What would be suggested patterns?
> Of course, ideally you would just keep data in windows longer such that
> you don't purge window state until you're sure there is no more data
> coming.  The problem with this approach in the real world is that you may
> be wrong with whatever time you choose ;)  I would suggest doing the best
> job possible upfront by using an appropriate watermark strategy to deal
> with most of the data.  Then process the truly late data with a separate
> path in the application code.  This "separate" path may have to deal with
> merging late data with the data that's already been written to the sink but
> this is definitely possible depending on the sink.

​Make sense. A truly late events should go through a side job that merge
with whatever written in sink. That might also imply both sinks able to do

e.g job doing search keyword count from begining, an outage caused some
hosts partitioned by keywords went down for couple of days. backfill job
started load and adding counts, after it backfilled all missing keywords
and merge aggregation results, it might needs to write to current yet
written windows and let main job pickup and merge results.

View raw message