flink-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "wangzhijiang999" <wangzhijiang...@aliyun.com>
Subject 答复:UpdateTaskExecutionState during JobManager failover
Date Fri, 15 Jan 2016 02:09:40 GMT
Hi Stephan,
 Thank you for detail explaination.  As you said, my opition is to keep task still running
druing jobmanager failover, even though sending update status failed.
For the first reason you mentioned, if i understand correctly, the key issue is status out
of sync between taskmanager and jobmanager. For example, when the jobmanager failover, the
task is at CREATED status . When the task status transition to RUNNING, the updateStatus message
can not be received because of jobmanager failover, then the taskmanager will retry sending
the message to jobmanager until success. When the jobmanager recovers, the previous status
of task is still CREATED in jobmanager view, and the task status maybe actually transition
to FINISHED in taskmanager view. The key problem is that when the jobmanager received the
FINISHED earlier than the RUNNING message, it will reject the FINISHED message.  If the task
maintain a queue for sending message during jobmanager failover in order to confirm that the
messages will be received in sequence at jobmanager when recover, that means the RUNNING status
message must be arrived before FINISHED status message, are there any problems?
For the second reason you mentioned,  i am not very clear of the machenism of filtering the
critical message by leaderSessionID, would you extend it in detail? 
I am trying to improve process of jobmanager and taskmanager failover, thank you for your
Zhijiang Wang
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message