airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Grant Nicholas"<>
Subject Re: Airflow kubernetes executor
Date Wed, 12 Jul 2017 16:21:55 GMT
Is having a static set of workers necessary? Launching a job on Kubernetes from a cached docker
image takes a few seconds max. I think this is an acceptable delay for a batch processing
system like airflow. 

Additionally, if you dynamically launch workers you can start dynamically launching *any type*
of worker and you don't have to statically allocate pools of worker types. IE) A single DAG
could use a scala docker image to do spark calculations, a C++ docker image to use some low
level numerical library,  and a python docker image by default to do any generic airflow stuff.
Additionally, you can size workers according to their usage. Maybe the spark driver program
only needs a few GBs of RAM but the C++ numerical library needs many hundreds. 

I agree there is a bit of extra book-keeping that needs to be done, but the tradeoff is an
important one to explicitly make. 

View raw message