airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Fokko Driesprong (JIRA)" <j...@apache.org>
Subject [jira] [Resolved] (AIRFLOW-2430) Bad query patterns at scale prevent scheduler from starting
Date Sun, 13 May 2018 18:55:00 GMT

     [ https://issues.apache.org/jira/browse/AIRFLOW-2430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Fokko Driesprong resolved AIRFLOW-2430.
---------------------------------------
       Resolution: Fixed
    Fix Version/s: 2.0.0
                   1.10.0

Issue resolved by pull request #3324
[https://github.com/apache/incubator-airflow/pull/3324]

> Bad query patterns at scale prevent scheduler from starting
> -----------------------------------------------------------
>
>                 Key: AIRFLOW-2430
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2430
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: scheduler
>            Reporter: Gabriel Silk
>            Priority: Major
>             Fix For: 1.10.0, 2.0.0
>
>
> h2. Summary
> Certain queries executed by the scheduler do not scale well with the number of tasks
being operated on. Two example functions 
>  * reset_state_for_orphaned_tasks
>  * _execute_task_instances
>  
> Concretely — with a mere 75k tasks being operated on, the first query can take dozens
of minutes to run, blocking the scheduler from making progress.
>  
> The cause is twofold:
> 1. As the query grows past a certain point, the MySQL planner will choose to do a full
table scan as opposed to using an index. I assume the same is true of Postgres.
> 2. The query predicate size grows linearly in the number of tasks being operated, thus
increasing the amount of work that needs to be done per row.
>  
> In a sense, you’re left with an operation that scales O(n^2)
>  
> h2. Proposed Fix
> It appears that one of these bad query patterns was fixed in [3547cbffd|https://github.com/apache/incubator-airflow/commit/3547cbffdbffac2f98a8aa05526e8c9671221025] by
introducing a configurable batch size with can be set via max_tis_per_query.
>  
> I propose we extend the suggested fix to include other poorly-performing queries in the
scheduler.
>  
> I’ve identified two queries that are directly affecting my work and included them in
the diff, though the same approach can be extended to more queries as we see fit.
>  
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message