flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Fritz Budiyanto <fbudi...@icloud.com>
Subject Re: Jobgraph not getting deleted from Zookeeper
Date Wed, 20 May 2020 01:24:37 GMT
Forgot to mentioned, Flink version is 1.9.2

On May 19, 2020 at 6:22 PM, Fritz Budiyanto <fbudiyan@icloud.com> wrote:


Hi All,


I have been seeing this issue several time where JobGraph are not cleaned up properly. As
a result, when Flink cluster is restarted, it will attempt to do HA restore on a checkpoint
which doesn't exist anymore and the new restarted cluster eventually go give up and stay down.

The workaround is to cleanup the jobgraph manually from Zookeeper. Is this a known issue? 


2020-05-19 19:56:21,471 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Un-registering
task and sending final execution state FINISHED to JobManager for task Source: kafkaConsumer[update_server]
-> (DetectedUpdateMessageConverter -> Sink: update_server.detected_updates, DrivenCoordinatesMessageConverter
-> Sink: update_server.driven_coordinates) 588902a8096f49845b09fa1f595d6065.
2020-05-19 19:56:21,622 INFO org.apache.flink.runtime.taskexecutor.slot.TaskSlotTable - Free
slot TaskSlot(index:0, state:ACTIVE, resource profile: ResourceProfile{cpuCores=1.7976931348623157E308,
heapMemoryInMB=2147483647, directMemoryInMB=2147483647, nativeMemoryInMB=2147483647, networkMemoryInMB=2147483647,
managedMemoryInMB=642}, allocationId: 29f6a5f83c832486f2d7ebe5c779fa32, jobId: 86a028b3f7aada8ffe59859ca71d6385).
2020-05-19 19:56:21,622 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Remove
job 86a028b3f7aada8ffe59859ca71d6385 from job leader monitoring.
2020-05-19 19:56:21,622 INFO org.apache.flink.runtime.leaderretrieval.ZooKeeperLeaderRetrievalService
- Stopping ZooKeeperLeaderRetrievalService /leader/86a028b3f7aada8ffe59859ca71d6385/job_manager_lock.
2020-05-19 19:56:21,623 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Close JobManager
connection for job 86a028b3f7aada8ffe59859ca71d6385.
2020-05-19 19:56:21,624 INFO org.apache.flink.runtime.taskexecutor.TaskExecutor - Close JobManager
connection for job 86a028b3f7aada8ffe59859ca71d6385.
2020-05-19 19:56:21,624 INFO org.apache.flink.runtime.taskexecutor.JobLeaderService - Cannot
reconnect to job 86a028b3f7aada8ffe59859ca71d6385 because it is not registered.


...

Zookeeper CLI:


ls /flink/cluster_update/jobgraphs
[86a028b3f7aada8ffe59859ca71d6385]

Thanks,
Fritz
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
    • Unnamed multipart/related (inline, None, 0 bytes)
View raw message