flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4853) Clean up JobManager registration at the ResourceManager
Date Fri, 21 Oct 2016 15:36:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4853?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15595448#comment-15595448
] 

ASF GitHub Bot commented on FLINK-4853:
---------------------------------------

Github user mxm commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2657#discussion_r84501742
  
    --- Diff: flink-runtime/src/main/java/org/apache/flink/runtime/resourcemanager/ResourceManager.java
---
    @@ -471,6 +498,149 @@ public void shutDownCluster(final ApplicationStatus finalStatus,
final String op
     	}
     
     	// ------------------------------------------------------------------------
    +	//  Testing methods
    +	// ------------------------------------------------------------------------
    +
    +	/**
    +	 * Gets the leader session id of current resourceManager.
    +	 *
    +	 * @return return the leaderSessionId of current resourceManager, this returns null
until the current resourceManager is granted leadership.
    +	 */
    +	@VisibleForTesting
    +	UUID getLeaderSessionId() {
    +		return leaderSessionId;
    +	}
    +
    +	// ------------------------------------------------------------------------
    +	//  Internal methods
    +	// ------------------------------------------------------------------------
    +
    +	private void clearState() {
    +		jobManagerRegistrations.clear();
    +		taskExecutors.clear();
    +		slotManager.clearState();
    +
    +		try {
    +			jobLeaderIdService.clear();
    +		} catch (Exception e) {
    +			onFatalError(new ResourceManagerException("Could not properly clear the job leader
id service.", e));
    +		}
    +
    +		leaderSessionId = new UUID(0, 0);
    +	}
    +
    +	/**
    +	 * Disconnects the job manager which is connected for the given job from the resource
manager.
    +	 *
    +	 * @param jobId identifying the job whose leader shall be disconnected
    +	 */
    +	protected void disconnectJobManager(JobID jobId, Exception cause) {
    +		JobManagerRegistration jobManagerRegistration = jobManagerRegistrations.remove(jobId);
    +
    +		if (jobManagerRegistration != null) {
    +			log.info("Disconnect job manager {}@{} for job {} from the resource manager.",
    +				jobManagerRegistration.getLeaderID(),
    +				jobManagerRegistration.getJobManagerGateway().getAddress(),
    +				jobId);
    +
    +			JobMasterGateway jobMasterGateway = jobManagerRegistration.getJobManagerGateway();
    +
    +			// tell the job manager about the disconnect
    +			jobMasterGateway.disconnectResourceManager(jobManagerRegistration.getLeaderID(), getLeaderSessionId(),
cause);
    +		} else {
    +			log.debug("There was no registered job manager for job {}.", jobId);
    +		}
    +	}
    +
    +	/**
    +	 * Checks whether the given resource manager leader id is matching the current leader
id.
    +	 *
    +	 * @param resourceManagerLeaderId to check
    +	 * @return True if the given leader id matches the actual leader id; otherwise false
    +	 */
    +	protected boolean isValid(UUID resourceManagerLeaderId) {
    +		if (resourceManagerLeaderId == null) {
    +			return leaderSessionId == null;
    --- End diff --
    
    Should `null` always return `false` if we assume that we use a default UUID in non high
availability mode?


> Clean up JobManager registration at the ResourceManager
> -------------------------------------------------------
>
>                 Key: FLINK-4853
>                 URL: https://issues.apache.org/jira/browse/FLINK-4853
>             Project: Flink
>          Issue Type: Sub-task
>          Components: ResourceManager
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>
> The current {{JobManager}} registration at the {{ResourceManager}} blocks threads in
the {{RpcService.execute}} pool. This is not ideal and can be avoided by not waiting on a
{{Future}} in this call.
> I propose to encapsulate the leader id retrieval operation in a distinct service so that
it can be separated from the {{ResourceManager}}. This will reduce the complexity of the {{ResourceManager}}
and make the individual components easier to test.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message