aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Joshua Cohen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AURORA-1614) Failed sandbox initialization can cause tasks to go LOST
Date Thu, 11 Feb 2016 16:55:18 GMT

    [ https://issues.apache.org/jira/browse/AURORA-1614?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15143024#comment-15143024
] 

Joshua Cohen commented on AURORA-1614:
--------------------------------------

https://reviews.apache.org/r/43486/

> Failed sandbox initialization can cause tasks to go LOST
> --------------------------------------------------------
>
>                 Key: AURORA-1614
>                 URL: https://issues.apache.org/jira/browse/AURORA-1614
>             Project: Aurora
>          Issue Type: Bug
>          Components: Executor
>            Reporter: Joshua Cohen
>            Assignee: Joshua Cohen
>            Priority: Minor
>
> When we initialize the sandbox, we only catch Sandbox specific error types, meaning that
if an unexpected error is raised, the executor just hangs until the timeout is exceeded, at
which point the task goes lost.
> We should instead broadly catch exceptions raised during sandbox initialization and quickly
fail tasks.
> Additionally, the {{DockerDirectorySandbox}} was not properly catching errors raised
when creating/symlinking which led to the above problem in the event of a misconfiguration.
In practice this issue shouldn't have occurred in normal usage, but it made development slow
until I tracked down what was causing the tasks to just hang.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message