flink-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jason Kania <jason.ka...@ymail.com>
Subject How to debug a job stuck in a deployment/run loop?
Date Fri, 24 Jan 2020 01:47:26 GMT
I am attempting to migrate from 1.7.1 to 1.9.1 and I have hit a problem where previously working
jobs can no longer launch after being submitted. In the UI, the submitted jobs show up as
deploying for a period, then go into a run state before returning to the deploy state and
this repeats regularly with the job bouncing between states. No exceptions or errors are visible
in the logs. There is no data coming in for the job to process and the kafka queues are empty.
If I look at the thread activity of the task manager running the job in top, I see that the
busiest threads are flink-akka threads, sometimes jumping to very high CPU numbers. That is
all I have for info.
Any suggestions on how to debug this? I can set break points and connect if that helps, just
not sure at this point where to start.
View raw message