airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Bandy (JIRA)" <j...@apache.org>
Subject [jira] [Comment Edited] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs
Date Sat, 07 Apr 2018 13:22:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16429355#comment-16429355
] 

Chris Bandy edited comment on AIRFLOW-2128 at 4/7/18 1:21 PM:
--------------------------------------------------------------

[~szmate1618] what is your {{scheduler.min_file_process_interval}} (or {{AIRFLOW_\_SCHEDULER__MIN_FILE_PROCESS_INTERVAL}}
environment) set to?


was (Author: cbandy):
[~szmate1618] what is your {{scheduler.min_file_process_interval}} (or {{AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL}}
environment) set to?

> 'Tall' DAGs scale worse than 'wide' DAGs
> ----------------------------------------
>
>                 Key: AIRFLOW-2128
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2128
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, DagRun, scheduler
>    Affects Versions: 1.9.0
>            Reporter: Máté Szabó
>            Priority: Major
>              Labels: performance, usability
>         Attachments: tall_dag.py, wide_dag.py
>
>
> Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ...
-> 998 -> 999
>  Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; ...
0 -> 999
> Take a super simple case where both graphs are of 1000 tasks, and all the tasks are
just "sleep 0.03" bash commands (see the attached files).
>  With the default SequentialExecutor (without paralellism), I would expect my 2 example
DAGs to take (approximately) the same time to run, but apparently this is not the case.
> For the wide DAG it was about 80 successfully executed tasks in 10 minutes, for the tall
one it was 0.
> This anomaly also seem to affect the web UI. Opening up the graph view or the tree view
for the wide DAG takes about 6 seconds on my machine, but for the tall one it takes significantly
longer, in fact currently it does not load at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message