airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Máté Szabó (JIRA) <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-2128) 'Tall' DAGs scale worse than 'wide' DAGs
Date Mon, 09 Apr 2018 11:29:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-2128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16430399#comment-16430399
] 

Máté Szabó commented on AIRFLOW-2128:
-------------------------------------


{code:java}
min_file_process_interval = 0
{code}

I believe this is the default setting.

> 'Tall' DAGs scale worse than 'wide' DAGs
> ----------------------------------------
>
>                 Key: AIRFLOW-2128
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-2128
>             Project: Apache Airflow
>          Issue Type: Bug
>          Components: DAG, DagRun, scheduler
>    Affects Versions: 1.9.0
>            Reporter: Máté Szabó
>            Priority: Major
>              Labels: performance, usability
>         Attachments: tall_dag.py, wide_dag.py
>
>
> Tall DAG = a DAG with long chains of dependencies, e.g.: 0 -> 1 -> 2 -> ...
-> 998 -> 999
>  Wide DAG = a DAG with many short, parallel dependencies e.g. 0 -> 1; 0 -> 2; ...
0 -> 999
> Take a super simple case where both graphs are of 1000 tasks, and all the tasks are
just "sleep 0.03" bash commands (see the attached files).
>  With the default SequentialExecutor (without paralellism), I would expect my 2 example
DAGs to take (approximately) the same time to run, but apparently this is not the case.
> For the wide DAG it was about 80 successfully executed tasks in 10 minutes, for the tall
one it was 0.
> This anomaly also seem to affect the web UI. Opening up the graph view or the tree view
for the wide DAG takes about 6 seconds on my machine, but for the tall one it takes significantly
longer, in fact currently it does not load at all.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message