flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Kruse, Sebastian" <Sebastian.Kr...@hpi.de>
Subject Heartbeat lost
Date Tue, 18 Nov 2014 08:46:31 GMT
Hi everyone,

In some of my jobs, I occasionally encounter the problem, that some of the task managers lose
the heartbeat connection to the job manager. The jobmanager did not crash, though. Here an
excerpt from the dashboard:

Error: java.lang.Exception: TaskManager lost heartbeat connection to JobManager
at org.apache.flink.runtime.taskmanager.TaskManager.registerAndRunHeartbeatLoop(TaskManager.java:847)
at org.apache.flink.runtime.taskmanager.TaskManager.access$000(TaskManager.java:109)
at org.apache.flink.runtime.taskmanager.TaskManager$1.run(TaskManager.java:365)

I am not sure if this is a bug. I rather figure that the network or jobmanager workload is
too high, so that somehow the heartbeats do not arrive (on time), but that's a mere guess.
A first step for me could be to increase the heartbeat interval.

Has anyone of you encountered this problem or do you have any ideas on how to avoid this issue?


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message