flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-4715) TaskManager should commit suicide after cancellation failure
Date Tue, 18 Oct 2016 08:27:58 GMT

    [ https://issues.apache.org/jira/browse/FLINK-4715?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15584854#comment-15584854
] 

ASF GitHub Bot commented on FLINK-4715:
---------------------------------------

Github user StephanEwen commented on a diff in the pull request:

    https://github.com/apache/flink/pull/2652#discussion_r83803660
  
    --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/taskmanager/TaskTest.java
---
    @@ -565,6 +568,50 @@ public void testOnPartitionStateUpdate() throws Exception {
     		verify(inputGate, times(1)).retriggerPartitionRequest(eq(partitionId.getPartitionId()));
     	}
     
    +	/**
    +	 * Task cancellation blocks the task canceller. Interrupt after cancel via
    +	 * cancellation watch dog.
    +	 */
    +	@Test
    +	public void testTaskCancelWatchDog() throws Exception {
    +		Configuration config = new Configuration();
    +		config.setLong(ConfigConstants.TASK_CANCELLATION_INTERVAL_MILLIS, 100);
    +		config.setLong(ConfigConstants.TASK_CANCELLATION_TIMEOUT_MILLIS, 1000);
    +
    +		Task task = createTask(InvokableBlockingInCancel.class, config);
    +		task.startTaskThread();
    +
    +		awaitLatch.await();
    +
    +		task.cancelExecution();
    +
    +		triggerLatch.await();
    +	}
    +
    +	@Test
    +	public void testReportFatalErrorAfterCancellationTimeout() throws Exception {
    +		Configuration config = new Configuration();
    +		config.setLong(ConfigConstants.TASK_CANCELLATION_INTERVAL_MILLIS, 10);
    +		config.setLong(ConfigConstants.TASK_CANCELLATION_TIMEOUT_MILLIS, 200);
    +
    +		Task task = createTask(InvokableBlockingInvokeAndCancel.class, config);
    +		task.startTaskThread();
    +
    +		awaitLatch.await();
    +
    +		task.cancelExecution();
    +
    +		for (int i = 0; i < 10; i++) {
    +			Object msg = taskManagerMessages.poll(1, TimeUnit.SECONDS);
    +			if (msg instanceof TaskManagerMessages.FatalError) {
    +				System.out.println(msg);
    +				return; // success
    +			}
    +		}
    +
    +		fail("Did not receive expected task manager message");
    --- End diff --
    
    Does this test leave a lingering endlessly looped execution thread?


> TaskManager should commit suicide after cancellation failure
> ------------------------------------------------------------
>
>                 Key: FLINK-4715
>                 URL: https://issues.apache.org/jira/browse/FLINK-4715
>             Project: Flink
>          Issue Type: Improvement
>          Components: TaskManager
>    Affects Versions: 1.2.0
>            Reporter: Till Rohrmann
>            Assignee: Ufuk Celebi
>             Fix For: 1.2.0
>
>
> In case of a failed cancellation, e.g. the task cannot be cancelled after a given time,
the {{TaskManager}} should kill itself. That way we guarantee that there is no resource leak.

> This behaviour acts as a safety-net against faulty user code.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message