aurora-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Farner (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (AURORA-728) Executor does not handle announcer errors properly
Date Fri, 19 Sep 2014 18:38:37 GMT

     [ https://issues.apache.org/jira/browse/AURORA-728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Bill Farner updated AURORA-728:
-------------------------------
    Assignee: Zameer Manji

> Executor does not handle announcer errors properly
> --------------------------------------------------
>
>                 Key: AURORA-728
>                 URL: https://issues.apache.org/jira/browse/AURORA-728
>             Project: Aurora
>          Issue Type: Bug
>            Reporter: Stephan Erb
>            Assignee: Zameer Manji
>
> Failures in the announcer lead to mesos and aurora running out of sync.
> Consider the following stacktrace:
> {code}
> Traceback (most recent call last):
>   File "/root/.pex/install/twitter.common.exceptions-0.3.0-py2-none-any.whl.aa74e2e8535b1ea39bf9512cf70dba3e5aea7b1b/twitter.common.exceptions-0.3.0-py2-none-any.whl/twitter/common/exceptions/__init__.py",
line 126, in _excepting_run
>     self.__real_run(*args, **kw)
>   File "/root/.pex/install/twitter.common.concurrent-0.3.0-py2-none-any.whl.3c9a3bf0ac76acff13a6803a37138bc9f18e54c7/twitter.common.concurrent-0.3.0-py2-none-any.whl/twitter/common/concurrent/deferred.py",
line 43, in run
>     self._closure()
>   File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/aurora_executor.py",
line 258, in <lambda>
>   File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/aurora_executor.py",
line 121, in _run
>   File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/aurora_executor.py",
line 161, in _start_status_manager
>   File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/common/announcer.py",
line 74, in from_assigned_task
>   File "/opt/thermos/bin/thermos_executor.pex/apache/aurora/executor/common/announcer.py",
line 100, in make_serverset
>   File "/root/.pex/install/kazoo-1.3.1-py2-none-any.whl.261c1cd5b2337063b238f0c52eeed45a1df90891/kazoo-1.3.1-py2-none-any.whl/kazoo/client.py",
line 475, in start
>     raise self.handler.timeout_exception("Connection time-out")
> kazoo.handlers.threading.TimeoutError: Connection time-out
> {code}
> *Current behaviour:* The executor dies. Mesos considers the task as RUNNING, whereas
aurora will eventually consider the task as LOST.
> *Expected behaviour:* The executor catches the exception and dispatches TASK_LOST or
TASK_FAILED



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message