airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stanislav Pak (JIRA)" <j...@apache.org>
Subject [jira] [Work started] (AIRFLOW-1463) Clear state of pending task when it fails due to DAG import error
Date Wed, 26 Jul 2017 00:28:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Work on AIRFLOW-1463 started by Stanislav Pak.
----------------------------------------------
> Clear state of pending task when it fails due to DAG import error
> -----------------------------------------------------------------
>
>                 Key: AIRFLOW-1463
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1463
>             Project: Apache Airflow
>          Issue Type: Improvement
>          Components: cli
>         Environment: Ubuntu 14.04
> Airflow 1.8.0
> SQS backed task queue, AWS RDS backed meta storage
> DAG folder is synced by script on code push: archive is downloaded from s3, unpacked,
moved, install script is run. airflow executable is replaced with symlink pointing to the
latest version of code, no airflow processes are restarted.
>            Reporter: Stanislav Pak
>            Assignee: Stanislav Pak
>            Priority: Minor
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> Our pipelines related code is deployed almost simultaneously on all airflow boxes: scheduler+webserver
box, workers boxes. Some common python package is deployed on those boxes on every other code
push (3-5 deployments per hour). Due to installation specifics, a DAG that imports module
from that package might fail. If DAG import fails when worker runs a task, the task is still
removed from the queue but task state is not changed, so in this case the task stays in PENDING
state forever.
> Beside the described case, there is scenario when it happens because of DAG update lag
in scheduler. A task can be scheduled with old DAG and worker can run the task with new DAG
that fails to be imported.
> There might be other scenarios when it happens.
> Proposal:
> Catch errors when importing DAG on task run and clear task instance state if import fails.
This should fix transient issues of this kind.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message