airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "EKC (Erik Cederstrand)" <...@novozymes.com>
Subject Failing jobs not properly terminated
Date Tue, 28 Feb 2017 15:10:04 GMT
Hi folks,


We're experiencing an issue where a failing DAG is not properly terminated. The log file contains
a Python stack trace of the failure and then ends with:


    {base_task_runner.py:112} INFO - Running: ['bash', '-c', 'airflow run my_dag my_task 2017-02-22T00:00:00
--job_id 127022 --raw -sd DAGS_FOLDER/my_dag.py']

          [...]

    {jobs.py:2127} WARNING - State of this instance has been externally set to failed. Taking
the poison pill. So long.


But the Celery process of the worker is never terminated:


    $ ps aux | grep 127022
    airflow  [...] python3 airflow run my_dag my_task 2017-02-22T00:00:00 --job_id 127022
--raw -sd DAGS_FOLDER/my_dag.py


Somehow, the scheduler does not see the job as failed, so the Celery queue quickly fills up
with dead jobs and nothing else gets worked on.


I had a look at jobs.py, and I have a suspicion that the self.terminating flag is never persisted
to the database. See https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L2122
Is this correct? Everything else I can see in that file does a session.merge() to persist
values.


Kind regards,

Erik Cederstrand


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message