airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (Jira)" <>
Subject [jira] [Commented] (AIRFLOW-5660) Scheduler becomes unresponsive when processing large DAGs on kubernetes.
Date Fri, 13 Dec 2019 00:56:00 GMT


ASF subversion and git services commented on AIRFLOW-5660:

Commit b887bc123ae9602a9f7351bb07751f78e0cd8cc0 in airflow's branch refs/heads/v1-10-test
from Aditya Vishwakarma
[;h=b887bc1 ]

[AIRFLOW-5660] Attempt to find the task in DB from Kubernetes pod labels (#6340)

Try to find the task in DB before regressing to searching every task, 
and explicitly warn about the performance regressions.

Co-Authored-By: Ash Berlin-Taylor <>
(cherry picked from commit 0f9983f472025621a09fe69a5fcb458663c05847)

> Scheduler becomes unresponsive when processing large DAGs on kubernetes.
> ------------------------------------------------------------------------
>                 Key: AIRFLOW-5660
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: executor-kubernetes
>    Affects Versions: 1.10.5
>            Reporter: Aditya Vishwakarma
>            Assignee: Daniel Imberman
>            Priority: Major
>             Fix For: 1.10.7
> For very large dags( 10,000+) and high parallelism, the scheduling loop can take more
5-10 minutes. 
> It seems that `_labels_to_key` function in kubernetes_executor loads all tasks with a
given execution date into memory. It does it for every task in progress. So, if 100 tasks
are in progress of a dag with 10,000 tasks, it will load million tasks on every tick of the
scheduler from db.
> []
> A quick fix is to search for task in the db directly before regressing to full scan.
I can submit a PR for it.
> A proper fix requires persisting a mapping of (safe_dag_id, safe_task_id, dag_id, task_id,
execution_date) somewhere, probably in the metadatabase.

This message was sent by Atlassian Jira

View raw message