flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-2472) Make the JobClientActor check periodically if the submitted Job is still running and if the JobManager is still alive
Date Tue, 04 Aug 2015 09:10:05 GMT

    [ https://issues.apache.org/jira/browse/FLINK-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14653325#comment-14653325
] 

ASF GitHub Bot commented on FLINK-2472:
---------------------------------------

Github user tillrohrmann commented on a diff in the pull request:

    https://github.com/apache/flink/pull/979#discussion_r36169844
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/client/JobClientActor.java
---
    @@ -59,11 +87,50 @@ public JobClientActor(
     		this.leaderSessionID = Preconditions.checkNotNull(leaderSessionID, "The leader session
ID option must not be null.");
     
     		this.sysoutUpdates = sysoutUpdates;
    +		// set this to -1 to indicate the job hasn't been created yet.
    +		this.currentJobCreatedAt = -1;
     	}
     	
     	@Override
     	protected void handleMessage(Object message) {
     		
    +		// ======= Job status messages on regular intervals ==============
    +		if(message instanceof JobManagerMessages.CurrentJobStatus){
    +			JobStatus statusReport = ((JobManagerMessages.CurrentJobStatus) message).status();
    +			long timeDiff;
    +			switch(statusReport){
    +				case RUNNING:
    +					// Vincent, we happy?
    +					this.currentJobCreatedAt = -1;
    +					break;
    +				case FINISHED:
    +					// Yeah! We happy!
    +					this.currentJobCreatedAt = -1;
    +					break;
    +				case CREATED:
    --- End diff --
    
    For simple jobs, Flink supports queued scheduling. Thus, it might be the case that one
job stays queued up for quite some time until its execution gets started.


> Make the JobClientActor check periodically if the submitted Job is still running and
if the JobManager is still alive
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: FLINK-2472
>                 URL: https://issues.apache.org/jira/browse/FLINK-2472
>             Project: Flink
>          Issue Type: Improvement
>            Reporter: Till Rohrmann
>            Assignee: Sachin Goel
>
> In case that the {{JobManager}} dies without notifying possibly connected {{JobClientActors}}
or if the job execution finishes without sending the {{SerializedJobExecutionResult}} back
to the {{JobClientActor}}, it might happen that a {{JobClient.submitJobAndWait}} never returns.
> I propose to let the {{JobClientActor}} periodically check whether the {{JobManager}}
is still alive and whether the submitted job is still running. If not, then the {{JobClientActor}}
should return an exception to complete the waiting future.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message