From dev-return-6438-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Thu Sep 6 16:55:47 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 128CE180668 for ; Thu, 6 Sep 2018 16:55:46 +0200 (CEST) Received: (qmail 47991 invoked by uid 500); 6 Sep 2018 14:55:41 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 47973 invoked by uid 99); 6 Sep 2018 14:55:40 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 06 Sep 2018 14:55:40 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 0AA0F1A2EE2 for ; Thu, 6 Sep 2018 14:55:40 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 2.869 X-Spam-Level: ** X-Spam-Status: No, score=2.869 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, FREEMAIL_REPLY=1, HTML_MESSAGE=2, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_PASS=-0.001, T_DKIMWL_WL_MED=-0.01] autolearn=disabled Authentication-Results: spamd2-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id lfGjb8OAXzNl for ; Thu, 6 Sep 2018 14:55:38 +0000 (UTC) Received: from mail-it0-f43.google.com (mail-it0-f43.google.com [209.85.214.43]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 7BC1D5F382 for ; Thu, 6 Sep 2018 14:55:38 +0000 (UTC) Received: by mail-it0-f43.google.com with SMTP id h23-v6so14510688ita.5 for ; Thu, 06 Sep 2018 07:55:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=naEeXlvUol6GwuDNs51GnViDZqWO4Ap8218Uzim5qeU=; b=Qa5AORwjw2zrJT4vMo1kT/r4RlIUvi+VWhl5o3mYSOyldiiPfEyhGfayWni5YgpNcf kfxCsmFEKm3wV9D/vJAIz9fm4RCJT1EgUr3C5kvfs0OemkE9nxoKlsXMO+j/n4oz0V7o 68pwK1n4XxtCeR/dNol92bdCM5CUljIm13qGfRJpKdQtRn1q9NXoxSQGpjdqkuhkJ9x0 wRWvloL6HjfDR4BbTxEIi7Kl0bdItCKIYZvwLGpZJfOP7RYQzj+pSatwDdXMiBjUTtIU bIpXJx3i26CflT/cMF3I8X4Csl3JJf4gyCW0ExAehccFpH6wAqYChzF/cI7kevvPoMlY izFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=naEeXlvUol6GwuDNs51GnViDZqWO4Ap8218Uzim5qeU=; b=kli9sAV2UBWyHXsHXw0/4vZv+DUZkYDXqNqz92jz/EccNocG/cGe30H/Lfwozcqybs kHTtYf8PYkqytnO4qRnf98TnH5zZIj9r7QjyND2m6ZC+gtLBtZ62r3fZOm9HcoYXUpjw chiQg3e4jpeOVYqWih7F/YBzulClyRxS4givLUCtc2PVJ4+Eme+vDHxA3KTrW44U8UPn 6yAVlktJlKeLF5RFcBZc9UJh4soas36P//R6vmm+hRUZ9V7WKkFIhxBdns61ABsa1PWo p2Neap//6iWf6FV7kRAd278LqFXjLfQK24P7oZzxPfiZP7I4W4+85G+AWGkscnFbla05 fOnA== X-Gm-Message-State: APzg51A+6fxCQyQuQ6ouCVo7C1l3Kclc8t7qlbjq0QD1SXiI+lDjuppe CwvyhE3RRgBkadjpurfOLlTtwVxkltd8xxmge2c8qPQo X-Google-Smtp-Source: ANB0VdbnR1PSqd+3uYCOyQvUuN81MlPAJ0pRL771jLHtwyIMkDqvYQEiuOo9gaD1S7Ms2qptaTIJNWRV0hTXcdD1j0s= X-Received: by 2002:a24:ed84:: with SMTP id r126-v6mr3094688ith.58.1536245737845; Thu, 06 Sep 2018 07:55:37 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Deng Xiaodong Date: Thu, 6 Sep 2018 22:55:26 +0800 Message-ID: Subject: Re: Best Practice of Airflow Setting-Up & Usage To: dev@airflow.incubator.apache.org Cc: dev@airflow.apache.org Content-Type: multipart/alternative; boundary="0000000000007df698057535156a" --0000000000007df698057535156a Content-Type: text/plain; charset="UTF-8" Thanks for sharing, Raman. Based on what you shared, I think there are two points that may be worth further discussing/thinking. *Scaling up (given thousands of DAGs):* If you have thousands of DAGs, you may encounter longer scheduling latency (actual start time minus planned start time). For workers, we can scale horizontally by adding more worker nodes, which is relatively straightforward. But *Scheduler* may become another bottleneck.Scheduler can only be running on one node (please correct me if I'm wrong). Even if we can use multiple threads for it, it has its limit. HA is another concern. This is also what our team is looking into at this moment, since scheduler is the biggest "bottleneck" identified by us so far (anyone has experience tuning scheduler performance?). *Broker for Celery Executor*: you may want to try RabbitMQ rather than Redis/SQL as broker? Actually the Celery community had the proposal to deprecate Redis as broker (of course this proposal was rejected eventually) [ https://github.com/celery/celery/issues/3274]. Regards, XD On Thu, Sep 6, 2018 at 6:10 PM ramandumcs@gmail.com wrote: > Hi, > We have a requirement to scale to run 1000(s) concurrent dags. With celery > executor we observed that > Airflow worker gets stuck sometimes if connection to redis/mysql breaks > (https://github.com/celery/celery/issues/3932 > https://github.com/celery/celery/issues/4457) > Currently we are using Airflow 1.9 with LocalExecutor but planning to > switch to Airflow 1.10 with K8 Executor. > > Thanks, > Raman Gupta > > > On 2018/09/05 12:56:38, Deng Xiaodong wrote: > > Hi folks, > > > > May you kindly share how your organization is setting up Airflow and > using > > it? Especially in terms of architecture. For example, > > > > - *Setting-Up*: Do you install Airflow in a "one-time" fashion, or > > containerization fashion? > > - *Executor:* Which executor are you using (*LocalExecutor*, > > *CeleryExecutor*, etc)? I believe most production environments are using > > *CeleryExecutor*? > > - *Scale*: If using Celery, normally how many worker nodes do you add? > (for > > sure this is up to workloads and performance of your worker nodes). > > - *Queue*: if Queue feature > > is used in your > > architecture? For what advantage? (for example, explicitly assign > > network-bound tasks to a worker node whose parallelism can be much higher > > than its # of cores) > > - *SLA*: do you have any SLA for your scheduling? (this is inspired by > > @yrqls21's PR 3830 < > https://github.com/apache/incubator-airflow/pull/3830>) > > - etc. > > > > Airflow's setting-up can be quite flexible, but I believe there is some > > sort of best practice, especially in the organisations where scalability > is > > essential. > > > > Thanks for sharing in advance! > > > > > > Best regards, > > XD > > > --0000000000007df698057535156a--