airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (Jira)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-4797) Zombie detection and killing is not deterministic
Date Thu, 17 Oct 2019 11:44:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-4797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953657#comment-16953657
] 

ASF subversion and git services commented on AIRFLOW-4797:
----------------------------------------------------------

Commit b0ec8716f0ecddd0cb6621bc981ccba12e74cabb in airflow's branch refs/heads/v1-10-stable
from Kevin Yang
[ https://gitbox.apache.org/repos/asf?p=airflow.git;h=b0ec871 ]

[AIRFLOW-4797] Use same zombies in all DAG file processors

(cherry picked from commit cb0dbe309b518813529ddf7545ae942e5767f5e5)


> Zombie detection and killing is not deterministic
> -------------------------------------------------
>
>                 Key: AIRFLOW-4797
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-4797
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>    Affects Versions: 1.10.3
>            Reporter: Stefan Seelmann
>            Assignee: Stefan Seelmann
>            Priority: Major
>             Fix For: 1.10.4
>
>
> Zombie detection and killing is done within the DAG file processing loop. Within one
iteration only a subset of the DAG files are processed (config scheduler.max_threads). The
loop sleeps for the rest of the second, until the next iteration runs which processes the
next subset of DAG files. The function to get zombie task instancs only returns zombies once
within 10 seconds, otherwise an empty list is returned.
> That means only in every 10th iteration of the DAG file processing loop zombies are detected.
And only if the zombie task belong to one of the DAG files of the current iteration they are
killed.
> We run into the worst case scenario with max_threads=2 and 20 DAGs. In such a scenario
only zombies of the same 2 DAGs are killed. (as loop iterations are not exactly 1s it shifts
slowly and eventually the zomies are killed, but in one example it took 33 minutes).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Mime
View raw message