aurora-reviews mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David McLaughlin <>
Subject Re: Review Request 65339: Fix infinite loop in Task State Machine due to TASK_UNKNOWN handling
Date Thu, 25 Jan 2018 09:34:00 GMT

This is an automatically generated e-mail. To reply, visit:

(Updated Jan. 25, 2018, 9:33 a.m.)

Review request for Aurora, Jordan Ly and Santhosh Kumar Shanmugham.

Bugs: AURORA-1966

Repository: aurora


As reported in, Mesos sends a TASK_UNKNOWN
when we try to kill (or reconcile) tasks that are unknown. On master, this leads to an infinite
loop. The sequence of events is:

2) We react to restarting or terminal -> PARTITIONED state by telling Mesos "that is a
bad state transition, that task should be dead".
3) Mesos replies with: that task is TASK_UNKNOWN
4) GO TO 1

AURORA-1966 describes just one case of this happening, but there are many other legitimate
paths to this. 

This patch cleans up the logic. The two main changes:

1) Do not allow ASSIGNED -> PARTITIONED. This is not really related to this bug, but I
found this logic error during debugging. ASSIGNED is a transient state and is subject to the
transient task timeout in the Scheduler, so we should not attempt to move to PARTITIONED during
that window. 
2) Do not try to kill tasks we think are terminal when Mesos tells us they are unknown. Originally
we did this because "manageTerminalTasks" is also used for restarting tasks - but in both
cases it never makes sense to respond  to "I don't know about that task" with a request to
kill it.

Diffs (updated)

  src/main/java/org/apache/aurora/scheduler/state/ b8ba5da729fcf5965b577c23e3062e5607bd07e7

  src/test/java/org/apache/aurora/scheduler/state/ 3d98fe651ad2b89a03044e8a06953a0cea876321




./gradlew test

Verified this fixes the issue reported in AURORA-1966 by forcing LaunchException in OfferManagerImpl
in my vagrant image and viewing logs.


David McLaughlin

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message