flink-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF GitHub Bot (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (FLINK-9099) Failing allocated slots not noticed
Date Tue, 27 Mar 2018 19:26:00 GMT

    [ https://issues.apache.org/jira/browse/FLINK-9099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16416124#comment-16416124
] 

ASF GitHub Bot commented on FLINK-9099:
---------------------------------------

Github user GJL commented on a diff in the pull request:

    https://github.com/apache/flink/pull/5775#discussion_r177544274
  
    --- Diff: flink-runtime/src/test/java/org/apache/flink/runtime/executiongraph/ExecutionGraphSchedulingTest.java
---
    @@ -465,6 +464,58 @@ public void testSchedulingOperationCancellationWhenCancel() throws
Exception {
     		assertThat(executionGraph.getTerminationFuture().get(), is(JobStatus.CANCELED));
     	}
     
    +	@Nonnull
    +	private TestingLogicalSlot createTestingSlot(@Nullable CompletableFuture<?> releaseFuture)
{
    +		return new TestingLogicalSlot(
    +			new LocalTaskManagerLocation(),
    +			new SimpleAckingTaskManagerGateway(),
    +			0,
    +			new AllocationID(),
    +			new SlotRequestId(),
    +			new SlotSharingGroupId(),
    +			releaseFuture);
    +	}
    +
    +	/**
    +	 * Tests that a partially completed eager scheduling operation fails if an
    +	 * completed slot is released. See FLINK-9099.
    +	 */
    +	@Test
    +	public void testSlotReleasingFailsSchedulingOperation() throws Exception {
    +		final int parallelism = 2;
    +
    +		final JobVertex jobVertex = new JobVertex("Testing job vertex");
    +		jobVertex.setInvokableClass(NoOpInvokable.class);
    +		jobVertex.setParallelism(parallelism);
    +		final JobGraph jobGraph = new JobGraph(jobVertex);
    +		jobGraph.setAllowQueuedScheduling(true);
    +		jobGraph.setScheduleMode(ScheduleMode.EAGER);
    +
    +		final ProgrammedSlotProvider slotProvider = new ProgrammedSlotProvider(2);
    --- End diff --
    
    Replace `2` with `parallelism`?


> Failing allocated slots not noticed
> -----------------------------------
>
>                 Key: FLINK-9099
>                 URL: https://issues.apache.org/jira/browse/FLINK-9099
>             Project: Flink
>          Issue Type: Bug
>          Components: Distributed Coordination
>    Affects Versions: 1.5.0
>            Reporter: Till Rohrmann
>            Assignee: Till Rohrmann
>            Priority: Critical
>              Labels: flip-6
>             Fix For: 1.5.0
>
>
> When allocating slots for eager scheduling, it can happen that allocated slots get failed
after they are assigned to the {{Execution}} (e.g. due to a {{TaskExecutor}} heartbeat timeout).
If there are still some uncompleted slot futures, then this will not be noticed since the
{{Execution}} is assigned to the {{LogicalSlot}} only after all slot futures are completed.
Therefore, the allocated slot failure will go unnoticed until this happens.
> In order to speed up failures, we should directly assign the {{Execution}} to the {{LogicalSlot}}
once the slot is assigned to the {{Execution}}.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message