Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 9FADA200C68 for ; Wed, 19 Apr 2017 00:44:21 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9E355160BAC; Tue, 18 Apr 2017 22:44:21 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id E691B160BA1 for ; Wed, 19 Apr 2017 00:44:20 +0200 (CEST) Received: (qmail 36880 invoked by uid 500); 18 Apr 2017 22:44:20 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 36863 invoked by uid 99); 18 Apr 2017 22:44:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 18 Apr 2017 22:44:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 5CB0CD2B3F for ; Tue, 18 Apr 2017 22:44:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.379 X-Spam-Level: ** X-Spam-Status: No, score=2.379 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RCVD_IN_SORBS_SPAM=0.5, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id lW7Ex4H1mryZ for ; Tue, 18 Apr 2017 22:44:18 +0000 (UTC) Received: from mail-wm0-f54.google.com (mail-wm0-f54.google.com [74.125.82.54]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTPS id 214725FDC4 for ; Tue, 18 Apr 2017 22:44:18 +0000 (UTC) Received: by mail-wm0-f54.google.com with SMTP id u2so8518288wmu.0 for ; Tue, 18 Apr 2017 15:44:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to; bh=JYjyeSXcey/E/+bkEnJceVIpBuSMeOW77H1hZQDW7Mg=; b=bIdNfbpt5Gkbu73TsWR/V3f88ey1qnjQujajO6fuj/FBKS8besYAhd1aQHJc37vMR0 Sy5c+zeosK9diO9BgTWIq/v5fqxlhpAvqxm7kO2aJWNnVIln0WhSQTPPtmSd+AG+b9g6 utzzkh3BxQIeIr5xj3kEFd68imw/zWwN8V+oM7BRv4du1tT1vCXO8xMdXhumcDa/r4+p DaJdR5ei93ie/bxG8zeV7M68juSWL36RzjXrAtxWCzbGo6yNI7mKRj6HvyFJ7pL1XfwB Hh2w+VZsOzknlmFy+QShIfVjsTvHXpurw2hwBGPSJYdn+IS1glH+EbfnRswJbGZ7i/X7 ihAw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to; bh=JYjyeSXcey/E/+bkEnJceVIpBuSMeOW77H1hZQDW7Mg=; b=Y91j3swRlc2+AKZT+i6fnG7ujxkdT8+3rWtUq/Yhn7UNxMEmB6IkLJ2cfIp75hwfUe pwp9WMNuyaLbyC3Oa4bLawSjQKfUqNKli/uF3XEg6sat7N2begmDZGC3G1naj4EdHMMU 9qGrx66H5ksc/OWNZm44zVEbDk5wnRNhN6rAMLXgciHPoMG15nTfFYZ216cNOpVq5NYu zOxbHVuR3CwhnURPP3Z+u9HKP7ALvLGGPsXagdm0gEuXKWJnJpO8M8gpFAwMIv/eB9cB zuokvOMBQbYM1EobH6YnQiJ02dT9nVv7TIaoflsTm/5CvXMQ+0VB9JG7RYDrWJIygD9q 9VIg== X-Gm-Message-State: AN3rC/5mhxiq1LRuZDJEO3NNk4fLUoPsUnfgIjGEr8j42J4X1DJk7Oy+ owutSEbPeGRezRMC/5Alpmqm4VKQlw== X-Received: by 10.28.91.82 with SMTP id p79mr185819wmb.130.1492555457602; Tue, 18 Apr 2017 15:44:17 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.153.167 with HTTP; Tue, 18 Apr 2017 15:44:17 -0700 (PDT) In-Reply-To: References: From: Maxime Beauchemin Date: Tue, 18 Apr 2017 15:44:17 -0700 Message-ID: Subject: Re: Best practices on Long running process over LB To: dev@airflow.incubator.apache.org Content-Type: multipart/alternative; boundary=001a114433badbafc3054d78a5e0 archived-at: Tue, 18 Apr 2017 22:44:21 -0000 --001a114433badbafc3054d78a5e0 Content-Type: text/plain; charset=UTF-8 The proper way to do this is for your service to return a token (unique identifier for the long running process) asynchronously (immediately), and to then call another endpoint to check on the status while passing this token. Since this is Airflow and you have the luxury of having a lot of predefined sensors, you may just have to call a trigger endpoint async, and in the next task have a sensor look for the actual byproduct of that service's process (say if the process generates an S3 file, you'd have an S3Sensor right after the trigger task). The good thing with this approach is that this is more "stateless" than the approach where you are using a token (it allows for tasks to die without worrying about the token). Max On Tue, Apr 18, 2017 at 2:47 PM, Amit Jain wrote: > Hi All, > > We have a use case where we are building Airflow DAG consisting of few > tasks and each task (HttpOperator) is calling the service running behind > AWS Elastic Load Balancer (ELB). > > Since these tasks are the long running process so I'm getting 504 GATEWAY > TIMEOUT HTTP status code and resulting into incorrect task status at > Airflow side. > > IMO to solve this problem, we can choose among following approaches > > - Make a call to the service and service will send back response and > process actual request in another thread/process. One monitoring thread > would heartbeat about task status to DB. At Airflow side, immediate task > after each HttpOperator, we should have a sensor which should check for > the > status change in given poke interval. > - Since we have around 1500 task running per hour so using service > discovery system like Apache Zookeeper to get the node in round-robin > fashion would make a direct connection with the node running service. > - AWS ELB has limitation over HTTP idle-timeout to 1hr and my tasks are > taking ~ 3 hr to get it done so no change at AWS ELB possible > > > Both approaches have cons first one, makes us change our current flow at > each service side i.e. handle a request in async mode, start heartbeat on > executing process/thread status in some interval hence the DB writes. > > I'm interested to know how you guys are handling this problem and any > suggestion or improvement in mentioned approaches I can use. > > > Thanks, > Amit > --001a114433badbafc3054d78a5e0--