airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From soma dhavala <soma.dhav...@gmail.com>
Subject question on an embarrassingly parallelism
Date Tue, 05 Feb 2019 15:20:26 GMT
Imagine that you have celery (or another cluster) executor with “m” workers (with equal
resources)

Say, I’ve the following dad

[0]<—[1a]<—[1b]<— [1c]
[0]<—[2a]<—[2b]<— [2c]
..
[0]<—[n1]<—[n2]<— [n3]

In the above, [0] node is the parent of “n” identical branches.
Suppose [0]’s computational time is negligible, and pipeline [a,b,c] takes say "h” hours.
Then, will all the “n” branches work in parallel (having a celery worker with “m”
workers with all equal resources), leading to a total computation time of approximately "(n/m)h
hours”  to finish the dag in ?
 
I know a small dag can be created to test the concept, just want to check if there is a theoretical
answer here that other devs are aware of.

thanks,
-soma

> On Feb 5, 2019, at 8:36 PM, Iván Robla Albarrán <ivanrobla@gmail.com> wrote:
> 
> Hi, 
> 
> In connections examples,  I see spark_default , is a spark type conection and i want
to create other Spark connection but i can't found spark in connection type. 
> 
> Images Attached
> 
> Could you help me? 
> 
> Thanks!!!
> 
> 


Mime
View raw message