airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kyle Hamlin <>
Subject Submitting 1000+ tasks to airflow programatically
Date Wed, 21 Mar 2018 17:34:57 GMT

I'm currently using Airflow for some ETL tasks where I submit a spark job
to a cluster and poll till it is complete. This workflow is nice because it
is typically a single Dag. I'm now starting to do more machine learning
tasks and need to build a model per client which is 1000+ clients. My
spark cluster is capable of handling this workload, however, it doesn't
seem scalable to write 1000+ dags to fit models for each client. I want
each client to have its own task instance so it can be retried if it
fails without having to run all 1000+ tasks over again. How do I handle
this type of workflow in Airflow?

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message