airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From soma dhavala <>
Subject question on an embarrassingly parallelism
Date Tue, 05 Feb 2019 15:20:26 GMT
Imagine that you have celery (or another cluster) executor with “m” workers (with equal

Say, I’ve the following dad

[0]<—[1a]<—[1b]<— [1c]
[0]<—[2a]<—[2b]<— [2c]
[0]<—[n1]<—[n2]<— [n3]

In the above, [0] node is the parent of “n” identical branches.
Suppose [0]’s computational time is negligible, and pipeline [a,b,c] takes say "h” hours.
Then, will all the “n” branches work in parallel (having a celery worker with “m”
workers with all equal resources), leading to a total computation time of approximately "(n/m)h
hours”  to finish the dag in ?
I know a small dag can be created to test the concept, just want to check if there is a theoretical
answer here that other devs are aware of.


> On Feb 5, 2019, at 8:36 PM, Iván Robla Albarrán <> wrote:
> Hi, 
> In connections examples,  I see spark_default , is a spark type conection and i want
to create other Spark connection but i can't found spark in connection type. 
> Images Attached
> Could you help me? 
> Thanks!!!

View raw message