airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Best practices on Long running process over LB
Date Tue, 18 Apr 2017 22:44:17 GMT
The proper way to do this is for your service to return a token (unique
identifier for the long running process) asynchronously (immediately), and
to then call another endpoint to check on the status while passing this
token.

Since this is Airflow and you have the luxury of having a lot of predefined
sensors, you may just have to call a trigger endpoint async, and in the
next task have a sensor look for the actual byproduct of that service's
process (say if the process generates an S3 file, you'd have an S3Sensor
right after the trigger task). The good thing with this approach is that
this is more "stateless" than the approach where you are using a token (it
allows for tasks to die without worrying about the token).

Max

On Tue, Apr 18, 2017 at 2:47 PM, Amit Jain <aj2011it@gmail.com> wrote:

> Hi All,
>
> We have a use case where we are building Airflow DAG consisting of few
> tasks and each task (HttpOperator) is calling the service running behind
> AWS Elastic Load Balancer (ELB).
>
> Since these tasks are the long running process so I'm getting 504 GATEWAY
> TIMEOUT HTTP status code and resulting into incorrect task status at
> Airflow side.
>
> IMO to solve this problem, we can choose among following approaches
>
>    - Make a call to the service and service will send back response and
>    process actual request in another thread/process. One monitoring thread
>    would heartbeat about task status to DB. At Airflow side, immediate task
>    after each HttpOperator, we should have a sensor which should check for
> the
>    status change in given poke interval.
>    - Since we have around 1500 task running per hour so using service
>    discovery system like Apache Zookeeper to get the node in round-robin
>    fashion would make a direct connection with the node running service.
>    - AWS ELB has limitation over HTTP idle-timeout to 1hr and my tasks are
>    taking ~ 3 hr to get it done so no change at AWS ELB possible
>
>
> Both approaches have cons first one, makes us change our current flow at
> each service side i.e. handle a request in async mode, start heartbeat on
> executing process/thread status in some interval hence the DB writes.
>
> I'm interested to know how you guys are handling this problem and any
> suggestion or improvement in mentioned approaches I can use.
>
>
> Thanks,
> Amit
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message