airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bolke de Bruin (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (AIRFLOW-695) Retries do not execute because dagrun is in FAILED state
Date Sun, 18 Dec 2016 20:03:59 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15759359#comment-15759359
] 

Bolke de Bruin edited comment on AIRFLOW-695 at 12/18/16 8:03 PM:
------------------------------------------------------------------

Ok I think I figured out the issue. The scheduler checks the tasks instances without taking
into account if the executor already reported back. In this case the executor reports back
several iterations later. Due to the fact tasks will not enter the queue when the task is
considered running, the task state will be "queued" indefinitely in limbo between the scheduler
and the executor.


was (Author: bolke):
Ok I think I figure out the issue. The scheduler checks the tasks instances without taking
into account if the executor already reported back. In this case the executor reports back
several iterations later. Due to the fact tasks will not enter the queue when the task is
considered running, the task state will be "queued" indefinitely in limbo between the scheduler
and the executor.

> Retries do not execute because dagrun is in FAILED state
> --------------------------------------------------------
>
>                 Key: AIRFLOW-695
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-695
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DagRun
>            Reporter: Harvey Xia
>            Priority: Blocker
>              Labels: executor, scheduler
>
> Currently on the latest master commit (15ff540ecd5e60e7ce080177ea3ea227582a4672), running
on the LocalExecutor, retries on tasks do not execute because the state of the corresponding
dagrun changes to FAILED. The task instance then gets blocked because "Task instance's dagrun
was not in the 'running' state but in the state 'failed'," the error message produced by the
following lines: https://github.com/apache/incubator-airflow/blob/master/airflow/ti_deps/deps/dagrun_exists_dep.py#L48-L50
> This error can be reproduced with the following simple DAG:
> {code:title=DAG.py|borderStyle=solid}
>         dag = models.DAG(dag_id='test_retry_handling')
>         task = BashOperator(
>             task_id='test_retry_handling_op',
>             bash_command='exit 1',
>             retries=1,
>             retry_delay=datetime.timedelta(minutes=1),
>             dag=dag,
>             owner='airflow',
>             start_date=datetime.datetime(2016, 2, 1, 0, 0, 0))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message