airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bolke de Bruin (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-695) Retries do not execute because dagrun is in FAILED state
Date Sun, 18 Dec 2016 19:32:59 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15759310#comment-15759310
] 

Bolke de Bruin commented on AIRFLOW-695:
----------------------------------------

I did some digging. I cannot replicate the behavior with the SequentialExecutor, but I can
with the LocalExecutor (I didn't try with Celery). It seems that the tasks is still part of
"self.running" when it is re-queued. In this state it will not be run again.

The executors have not been updated recently so the issue must be in the calling functions.
I haven't figure that out yet. The state change of the task should be caught by the "heartbeat"
method calling the "sync" method of the executor and then it should be removed from "self.running".
It seems it isn't.

[~aoen] [~pauly] Maybe you guys have a clue.

> Retries do not execute because dagrun is in FAILED state
> --------------------------------------------------------
>
>                 Key: AIRFLOW-695
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-695
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>            Reporter: Harvey Xia
>            Priority: Blocker
>              Labels: executor, scheduler
>
> Currently on the latest master commit (15ff540ecd5e60e7ce080177ea3ea227582a4672), running
on the LocalExecutor, retries on tasks do not execute because the state of the corresponding
dagrun changes to FAILED. The task instance then gets blocked because "Task instance's dagrun
was not in the 'running' state but in the state 'failed'," the error message produced by the
following lines: https://github.com/apache/incubator-airflow/blob/master/airflow/ti_deps/deps/dagrun_exists_dep.py#L48-L50
> This error can be reproduced with the following simple DAG:
> {code:title=DAG.py|borderStyle=solid}
>         dag = models.DAG(dag_id='test_retry_handling')
>         task = BashOperator(
>             task_id='test_retry_handling_op',
>             bash_command='exit 1',
>             retries=1,
>             retry_delay=datetime.timedelta(minutes=1),
>             dag=dag,
>             owner='airflow',
>             start_date=datetime.datetime(2016, 2, 1, 0, 0, 0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message