From dev-return-5260-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Tue May 29 12:13:22 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 5EE84180648 for ; Tue, 29 May 2018 12:13:21 +0200 (CEST) Received: (qmail 84456 invoked by uid 500); 29 May 2018 10:13:20 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 84443 invoked by uid 99); 29 May 2018 10:13:19 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd1-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 29 May 2018 10:13:19 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd1-us-west.apache.org (ASF Mail Server at spamd1-us-west.apache.org) with ESMTP id 0FE68C1FB8 for ; Tue, 29 May 2018 10:13:19 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd1-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -0.012 X-Spam-Level: X-Spam-Status: No, score=-0.012 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd1-us-west.apache.org (amavisd-new); dkim=pass (1024-bit key) header.d=amplifytechnologies.onmicrosoft.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd1-us-west.apache.org [10.40.0.7]) (amavisd-new, port 10024) with ESMTP id hbEex2feO7Tk for ; Tue, 29 May 2018 10:13:13 +0000 (UTC) Received: from NAM01-BN3-obe.outbound.protection.outlook.com (mail-bn3nam01on0137.outbound.protection.outlook.com [104.47.33.137]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 52D505F2F0 for ; Tue, 29 May 2018 10:13:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=AmplifyTechnologies.onmicrosoft.com; s=selector1-amplifynation-com01i; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=pKS/t3I0RHEVRs7XdzCYEaXkmtLksMwa5byGPDJoY7A=; b=q9YMUg6h91w/pXzXXv/Tcfn26KVHJGWnQ+YCfjoiECsO0CxpqEPSpL5qfhCXevEu7DKys0klO0HJkMZWgPUINndcHpBZgVb1wOI/945QdwgNwNoj9ErfYlvF+Ni+RLYZ3M2T++dmseoUIpKXocmWOz8KZtD+GynNXhGGQo0VvD0= Received: from CY4PR0101MB3029.prod.exchangelabs.com (10.171.220.140) by CY4PR0101MB3031.prod.exchangelabs.com (10.171.220.142) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.797.11; Tue, 29 May 2018 10:13:04 +0000 Received: from CY4PR0101MB3029.prod.exchangelabs.com ([fe80::5ce:8ae7:84a3:5f91]) by CY4PR0101MB3029.prod.exchangelabs.com ([fe80::5ce:8ae7:84a3:5f91%3]) with mapi id 15.20.0820.010; Tue, 29 May 2018 10:13:04 +0000 From: Victor Noagbodji To: "dev@airflow.incubator.apache.org" Subject: Re: How to wait for external process Thread-Topic: How to wait for external process Thread-Index: AQHT9QlV7e1c56D4BUqlKFvhgfNcC6RFMwMAgABKh4CAABcigIAABjuAgADmSIA= Date: Tue, 29 May 2018 10:13:04 +0000 Message-ID: <11A6F5C8-CB38-4912-99EC-BEDF2C69E7D3@amplify-nation.com> References: <4f524e0b-f6ae-6536-c0c2-32c83c8a91eb@stefan-seelmann.de> In-Reply-To: Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: authentication-results: spf=none (sender IP is ) smtp.mailfrom=vnoagbodji@amplify-nation.com; x-originating-ip: [75.40.141.120] x-ms-publictraffictype: Email x-microsoft-exchange-diagnostics: 1;CY4PR0101MB3031;7:GmrR1cfuLEoby1JYYRG+7Ba4YcIQv+/dOCHD3o4fg91eBp/EAhVhoXfQeELqa9DBUhWieEtojp56070RpFqroyLOKbD56cHY4O5DzcKduc7OHJGuOkxGXaM8zVEjWsAznahTuwrPVarIgkiPwGQQbwDLcJNuo9QJBneLBt+8Tw4Qsd1ry+VZh570rNJh3YDVcK9ytw8URWZonF1tObOQJSGTSnkxkkWJZ1wIsBllPSw8nN0TjF30eNnokwxg3MjN x-ms-exchange-antispam-srfa-diagnostics: SOS; x-microsoft-antispam: UriScan:;BCL:0;PCL:0;RULEID:(7020095)(4652020)(7021125)(5600026)(4534165)(7022125)(4603075)(4627221)(201702281549075)(7048125)(7024125)(7027125)(7028125)(7023125)(2017052603328)(7153060)(7193020);SRVR:CY4PR0101MB3031; x-ms-traffictypediagnostic: CY4PR0101MB3031: x-microsoft-antispam-prvs: x-exchange-antispam-report-test: UriScan:(271806183753584)(85827821059158); x-ms-exchange-senderadcheck: 1 x-exchange-antispam-report-cfa-test: BCL:0;PCL:0;RULEID:(6040522)(2401047)(8121501046)(5005006)(3002001)(93006095)(93001095)(3231254)(944501410)(52105095)(10201501046)(149027)(150027)(6041310)(2016111802025)(20161123558120)(20161123560045)(20161123562045)(20161123564045)(6043046)(6072148)(201708071742011)(7699016);SRVR:CY4PR0101MB3031;BCL:0;PCL:0;RULEID:;SRVR:CY4PR0101MB3031; x-forefront-prvs: 0687389FB0 x-forefront-antispam-report: SFV:NSPM;SFS:(10019020)(396003)(346002)(39830400003)(39380400002)(376002)(366004)(377424004)(189003)(199004)(129404003)(97736004)(5660300001)(36756003)(6916009)(6486002)(86362001)(93886005)(66066001)(99286004)(76176011)(305945005)(478600001)(7736002)(5640700003)(6436002)(229853002)(105586002)(59450400001)(14454004)(11346002)(102836004)(1730700003)(81156014)(81166006)(8676002)(316002)(53546011)(5250100002)(186003)(83716003)(446003)(26005)(2900100001)(106356001)(2616005)(476003)(486006)(53936002)(82746002)(6512007)(6116002)(3846002)(6506007)(6246003)(3660700001)(2501003)(25786009)(3280700002)(2906002)(68736007)(2351001)(33656002)(8936002);DIR:OUT;SFP:1102;SCL:1;SRVR:CY4PR0101MB3031;H:CY4PR0101MB3029.prod.exchangelabs.com;FPR:;SPF:None;LANG:en;PTR:InfoNoRecords;MX:1;A:1; received-spf: None (protection.outlook.com: amplify-nation.com does not designate permitted sender hosts) x-microsoft-antispam-message-info: Lchkvxh/2wXhZwXh9FmiXNddtgRvWoC2NNYWca9LJ2zESDWa9nURVHkbyyQgoiEOwZd094QKxTvOoOpBsHie8gayPrhLDmqLrL+lYqQHSEK5BCJyaxBBM53AZe2f5xgqohtslY6pjbOEGWRBTwRGEe+8Iac7cZFfg6MGxAhIss/hhWjmMDc4hMeHZWSPBFYi spamdiagnosticoutput: 1:99 spamdiagnosticmetadata: NSPM Content-Type: text/plain; charset="us-ascii" Content-ID: <80DE49AFFE2A0941BF0C38794610ADC0@prod.exchangelabs.com> Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-MS-Office365-Filtering-Correlation-Id: b9cba552-88b6-4dca-8ffd-08d5c54cc473 X-OriginatorOrg: amplify-nation.com X-MS-Exchange-CrossTenant-Network-Message-Id: b9cba552-88b6-4dca-8ffd-08d5c54cc473 X-MS-Exchange-CrossTenant-originalarrivaltime: 29 May 2018 10:13:04.6337 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Hosted X-MS-Exchange-CrossTenant-id: 6846dba9-cb1d-4bea-950f-3bcded6bfade X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR0101MB3031 hi, here's another vote for persistence. we did similar thing where processing = state is stored in the database. there is no part of the DAG that does a pe= riodic check. the DAG retriggers itself and its very first task is to figur= e out if there is work to do or bail out. > On May 28, 2018, at 4:28 PM, Ananth Durai wrote: >=20 > Since you already on AWS, the simplest thing I could think of is to write= a > signal file once the job finished and the downstream job waiting for the > signal file. In other words, the same pattern how the Hadoop jobs writing > `_SUCCESS` file and the downstream jobs depends on the signal file. >=20 > Regards, > Ananth.P, >=20 >=20 >=20 >=20 >=20 >=20 > On 28 May 2018 at 13:06, Stefan Seelmann wrote: >=20 >> Thanks Christopher for the idea. That would work, we already have such a >> "listener" that polls a queue (SQS) and creates the DAG runs. However it >> would have been nice to have the full process in one DAG to have a >> better overview about running jobs and leverage the gantt chart, but I >> think this can be accomplished via custom plugins and views. >>=20 >> On 05/28/2018 08:43 PM, Christopher Bockman wrote: >>> Haven't done this, but we'll have a similar need in the future, so have >>> investigated a little. >>>=20 >>> What about a design pattern something like this: >>>=20 >>> 1) When jobs are done (ready for further processing) they publish those >>> details to a queue (such as GC Pub/Sub or any other sort of queue) >>>=20 >>> 2) A single "listener" DAG sits and periodically checks that queue. If >> it >>> finds anything on it, it triggers (via DAG trigger) all of the DAGs whi= ch >>> are on the queue.* >>>=20 >>> * =3D if your triggering volume is too high, this may cause airflow iss= ues >> w/ >>> too many going at once; this could presumably be solved then via custom >>> rate-limiting on firing these >>>=20 >>> 3) The listener DAG resets itself (triggers itself) >>>=20 >>>=20 >>> On Mon, May 28, 2018 at 7:17 AM, Driesprong, Fokko >>=20 >>> wrote: >>>=20 >>>> Hi Stefan, >>>>=20 >>>> Afaik there isn't a more efficient way of doing this. DAGs that are >> relying >>>> on a lot of sensors are experiencing the same issues. The only way rig= ht >>>> now, I can think of, is doing updating the state directly in the >> database. >>>> But then you need to know what you are doing. I can image that this >> would >>>> be feasible by using an AWS lambda function. Hope this helps. >>>>=20 >>>> Cheers, Fokko >>>>=20 >>>> 2018-05-26 17:50 GMT+02:00 Stefan Seelmann : >>>>=20 >>>>> Hello, >>>>>=20 >>>>> I have a DAG (externally triggered) where some processing is done at = an >>>>> external system (EC2 instance). The processing is started by an Airfl= ow >>>>> task (via HTTP request). The DAG should only continue once that >>>>> processing is completed. In a first naive implementation I created a >>>>> sensor that gets the progress (via HTTP request) and only if status i= s >>>>> "finished" returns true and the DAG run continues. That works but... >>>>>=20 >>>>> ... the external processing can take hours or days, and during that >> time >>>>> a worker is occupied which does nothing but HTTP GET and sleep. There >>>>> will be hundreds of DAG runs in parallel which means hundreds of >> workers >>>>> are occupied. >>>>>=20 >>>>> I looked into other operators that do computation on external systems >>>>> (ECSOperator, AWSBatchOperator) but they also follow that pattern and >>>>> just wait/sleep. >>>>>=20 >>>>> So I want to ask if there is a more efficient way to build such a >>>>> workflow with Airflow? >>>>>=20 >>>>> Kind Regards, >>>>> Stefan >>>>>=20 >>>>=20 >>>=20 >>=20 >>=20