airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefan Seelmann <m...@stefan-seelmann.de>
Subject How to wait for external process
Date Sat, 26 May 2018 15:50:41 GMT
Hello,

I have a DAG (externally triggered) where some processing is done at an
external system (EC2 instance). The processing is started by an Airflow
task (via HTTP request). The DAG should only continue once that
processing is completed. In a first naive implementation I created a
sensor that gets the progress (via HTTP request) and only if status is
"finished" returns true and the DAG run continues. That works but...

... the external processing can take hours or days, and during that time
a worker is occupied which does nothing but HTTP GET and sleep. There
will be hundreds of DAG runs in parallel which means hundreds of workers
are occupied.

I looked into other operators that do computation on external systems
(ECSOperator, AWSBatchOperator) but they also follow that pattern and
just wait/sleep.

So I want to ask if there is a more efficient way to build such a
workflow with Airflow?

Kind Regards,
Stefan

Mime
View raw message