airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "ASF subversion and git services (JIRA)" <>
Subject [jira] [Commented] (AIRFLOW-2430) Bad query patterns at scale prevent scheduler from starting
Date Sun, 13 May 2018 18:55:00 GMT


ASF subversion and git services commented on AIRFLOW-2430:

Commit 042c3f2aeec7e4da335ae900c5b7499250304175 in incubator-airflow's branch refs/heads/master
from [~gsilk]
[;h=042c3f2 ]

[AIRFLOW-2430] Extend query batching to additional slow queries

Closes #3324 from gsilk/batch-inserts

> Bad query patterns at scale prevent scheduler from starting
> -----------------------------------------------------------
>                 Key: AIRFLOW-2430
>                 URL:
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Gabriel Silk
>            Priority: Major
>             Fix For: 1.10.0, 2.0.0
> h2. Summary
> Certain queries executed by the scheduler do not scale well with the number of tasks
being operated on. Two example functions 
>  * reset_state_for_orphaned_tasks
>  * _execute_task_instances
> Concretely — with a mere 75k tasks being operated on, the first query can take dozens
of minutes to run, blocking the scheduler from making progress.
> The cause is twofold:
> 1. As the query grows past a certain point, the MySQL planner will choose to do a full
table scan as opposed to using an index. I assume the same is true of Postgres.
> 2. The query predicate size grows linearly in the number of tasks being operated, thus
increasing the amount of work that needs to be done per row.
> In a sense, you’re left with an operation that scales O(n^2)
> h2. Proposed Fix
> It appears that one of these bad query patterns was fixed in [3547cbffd|] by
introducing a configurable batch size with can be set via max_tis_per_query.
> I propose we extend the suggested fix to include other poorly-performing queries in the
> I’ve identified two queries that are directly affecting my work and included them in
the diff, though the same approach can be extended to more queries as we see fit.
> Thanks!

This message was sent by Atlassian JIRA

View raw message