airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Amit Jain <>
Subject Best practices on Long running process over LB
Date Tue, 18 Apr 2017 21:47:58 GMT
Hi All,

We have a use case where we are building Airflow DAG consisting of few
tasks and each task (HttpOperator) is calling the service running behind
AWS Elastic Load Balancer (ELB).

Since these tasks are the long running process so I'm getting 504 GATEWAY
TIMEOUT HTTP status code and resulting into incorrect task status at
Airflow side.

IMO to solve this problem, we can choose among following approaches

   - Make a call to the service and service will send back response and
   process actual request in another thread/process. One monitoring thread
   would heartbeat about task status to DB. At Airflow side, immediate task
   after each HttpOperator, we should have a sensor which should check for the
   status change in given poke interval.
   - Since we have around 1500 task running per hour so using service
   discovery system like Apache Zookeeper to get the node in round-robin
   fashion would make a direct connection with the node running service.
   - AWS ELB has limitation over HTTP idle-timeout to 1hr and my tasks are
   taking ~ 3 hr to get it done so no change at AWS ELB possible

Both approaches have cons first one, makes us change our current flow at
each service side i.e. handle a request in async mode, start heartbeat on
executing process/thread status in some interval hence the DB writes.

I'm interested to know how you guys are handling this problem and any
suggestion or improvement in mentioned approaches I can use.


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message