flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dongwon Kim <eastcirc...@gmail.com>
Subject Restore from a savepoint is very slow
Date Mon, 02 Apr 2018 05:30:59 GMT
Hi,

While restoring from the latest checkpoint starts immediately after the job is restarted,
restoring from a savepoint takes more than five minutes until the job makes progress.
During the blackout, I cannot observe any resource usage over the cluster.
After that period of time, I observe that Flink tries to catch up with the progress in the
source topic via various metrics including flink_taskmanager_job_task_currentLowWatermark.

FYI, I'm using
- Flink-1.4.2
- FsStateBackend configured with HDFS
- EventTime with BoundedOutOfOrdernessTimestampExtractor

The size of an instance of checkpoint/savepoint is ~50GB and we have 7 servers for taskmanagers.

Best,

- Dongwon
Mime
View raw message