flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chengzhi Zhao <w.zhaocheng...@gmail.com>
Subject Watermark Question on Failed Process
Date Mon, 02 Apr 2018 21:51:18 GMT
Hello, flink community,

I am using period watermark and extract the event time from each records
from files in S3. I am using the `TimeLagWatermarkGenerator` as it was
mentioned in flink documentation.

Currently, a new watermark will be generated using processing time by fixed
amount

override def getCurrentWatermark: Watermark = {
    new Watermark(System.currentTimeMillis() - maxTimeLag)
}

This would work fine as long as process is running. However, in case of
failures, I mean if there was some bad data or out of memory occurs, I need
to stop the process and it will take me time to get back. If the maxTimeLag=3
hours, and it took me 12 hours to realize and fix it.

My question is since I am using processing time as part of the watermark,
when flink resumed from failures, will some records might be ignored by the
watermark? And what's the best practice to catchup and continue without
losing data? Thanks!

Best,
Chengzhi

Mime
View raw message