airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "EKC (Erik Cederstrand)" <>
Subject Failing jobs not properly terminated
Date Tue, 28 Feb 2017 15:10:04 GMT
Hi folks,

We're experiencing an issue where a failing DAG is not properly terminated. The log file contains
a Python stack trace of the failure and then ends with:

    {} INFO - Running: ['bash', '-c', 'airflow run my_dag my_task 2017-02-22T00:00:00
--job_id 127022 --raw -sd DAGS_FOLDER/']


    {} WARNING - State of this instance has been externally set to failed. Taking
the poison pill. So long.

But the Celery process of the worker is never terminated:

    $ ps aux | grep 127022
    airflow  [...] python3 airflow run my_dag my_task 2017-02-22T00:00:00 --job_id 127022
--raw -sd DAGS_FOLDER/

Somehow, the scheduler does not see the job as failed, so the Celery queue quickly fills up
with dead jobs and nothing else gets worked on.

I had a look at, and I have a suspicion that the self.terminating flag is never persisted
to the database. See
Is this correct? Everything else I can see in that file does a session.merge() to persist

Kind regards,

Erik Cederstrand

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message