From users-return-617-archive-asf-public=cust-asf.ponee.io@airflow.apache.org Mon Jun 21 20:23:32 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-he-de.apache.org (mxout1-he-de.apache.org [95.216.194.37]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 438CB18037A for ; Mon, 21 Jun 2021 22:23:32 +0200 (CEST) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-he-de.apache.org (ASF Mail Server at mxout1-he-de.apache.org) with SMTP id 2DEA560D1C for ; Mon, 21 Jun 2021 20:23:31 +0000 (UTC) Received: (qmail 85145 invoked by uid 500); 21 Jun 2021 20:23:30 -0000 Mailing-List: contact users-help@airflow.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@airflow.apache.org Delivered-To: mailing list users@airflow.apache.org Received: (qmail 85135 invoked by uid 99); 21 Jun 2021 20:23:30 -0000 Received: from spamproc1-he-de.apache.org (HELO spamproc1-he-de.apache.org) (116.203.196.100) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 21 Jun 2021 20:23:30 +0000 Received: from localhost (localhost [127.0.0.1]) by spamproc1-he-de.apache.org (ASF Mail Server at spamproc1-he-de.apache.org) with ESMTP id 9B1B11FF481 for ; Mon, 21 Jun 2021 20:23:29 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamproc1-he-de.apache.org X-Spam-Flag: NO X-Spam-Score: -0.001 X-Spam-Level: X-Spam-Status: No, score=-0.001 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.2, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamproc1-he-de.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=gmail.com Received: from mx1-he-de.apache.org ([116.203.227.195]) by localhost (spamproc1-he-de.apache.org [116.203.196.100]) (amavisd-new, port 10024) with ESMTP id SQuxzJR_BnoM for ; Mon, 21 Jun 2021 20:23:29 +0000 (UTC) Received-SPF: Pass (mailfrom) identity=mailfrom; client-ip=2a00:1450:4864:20::62c; helo=mail-ej1-x62c.google.com; envelope-from=dpstandish@gmail.com; receiver= Received: from mail-ej1-x62c.google.com (mail-ej1-x62c.google.com [IPv6:2a00:1450:4864:20::62c]) by mx1-he-de.apache.org (ASF Mail Server at mx1-he-de.apache.org) with ESMTPS id 2A51C7FFDE for ; Mon, 21 Jun 2021 20:23:29 +0000 (UTC) Received: by mail-ej1-x62c.google.com with SMTP id dm5so18481433ejc.9 for ; Mon, 21 Jun 2021 13:23:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=2DnXZdiaovozBz9hk5trprwTwRuYeWctUUfMPS46hKE=; b=vL8QlqBU+MXzLB3g2yPfzh2D9bJvgA5acqBtAuc2ah1PK3S06tcS4a8yHGjG2zQHrZ HiIayQjYgHDS68TLXF+heDvS2NpltEldWMJfhMNVdP6hLjMJ3qJhIrovbKSm6pDS73FU 6e6pvTRLhvsYebDx39aybgUlKHhPJb6UOu5oMbEyiwNLPzBlUl3JmxzKblFQvHupGZ05 WY75RAL9t1/gzRNxkk07oK7VzpzqxktT+E5QGOPixo6oQrI6NIbtgyGkHIAzLzN+zDsv jNBU4UzRSXP+9GsjUq4HN+73LIBCs7YICcmAw/hT+cDg4/AsiSnpuwkWcNiQc1PaUqmp cXBw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=2DnXZdiaovozBz9hk5trprwTwRuYeWctUUfMPS46hKE=; b=HcZBXo8OE9NLhplbTgDhlstmbILhM/5Qk3rfojBdStPfl7EqUXNd2VMLb/1Ub4T03W ON1Ebhn8zdxjzJ1ldBtOrdn9rF6cxF24TpbEK2XThGmUXeF4SkAik0G/3SKzfmvJ9svr bK9UQAfxuGGQcfrOMbqRIZ7ldulesQgzol/zyj+z1n2QoBK0/YA+/i4jMaulP61t0sZg sHt630DsVDOldgemEICw6UAR3VYkiWqteNYXopTxS9fQaAYHHBu4v/6P1COWCeHGZ/mr KwvmZ18NHmUI9NXXL1wstUXpEyzutHXqZLmp+P7bqCGuaWFRavc0mGvB1G4XTlxeGtms Wh4w== X-Gm-Message-State: AOAM533HPAFv/FoCYYLWEoPVRpLYEVhRl3RKxpCYVzsOzP8MU2VZ3em/ 1WgeTLVbNbpDgEw7PoJUCZSSNIkhsyFTxfEPoC8LMdezEME= X-Google-Smtp-Source: ABdhPJyE/wxhEbQsVkiVd0YLJe5wBoH/q8jGa9wOamXXiCMA5yQjV2xUtSDf6VYPWiU9ojCsrIdWvBZWDEFY7TZuFDc= X-Received: by 2002:a17:907:7201:: with SMTP id dr1mr27132260ejc.19.1624307008333; Mon, 21 Jun 2021 13:23:28 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Daniel Standish Date: Mon, 21 Jun 2021 13:22:52 -0700 Message-ID: Subject: Re: Best Practice: dynamic dags with external dependencies To: users@airflow.apache.org Content-Type: multipart/alternative; boundary="0000000000003cf09005c54c72d3" --0000000000003cf09005c54c72d3 Content-Type: text/plain; charset="UTF-8" The only hurdle to overcome with this approach is getting the file into every running container (depending on your infra setup). E.g. if worker 1 picks up the "update config" task and updates a config file locally, it would not be accessible in the scheduler container, or worker 2. Do you have a network drive mounted into every container so that once the config file is updated it is then immediately available to all containers? Or some other solution? What I have done in this scenario is have the "update config" dag update an airflow variable. Then the dynamic dag reads from that variable to generate the tasks. This avoids the file problem I describe above. It does make a call to the metastore but in practice that does not seem to be a problem. Another thing I have thought about is generate the config file during deployments and bake it into the image but that requires more setup than the variable approach so I did not go that route. Having one "config update" dag for all such processes like this seems like a pretty good way to go. But for me right now I update the config variable within the dag that uses the config. On Mon, Jun 21, 2021 at 12:55 PM Dan Andreescu wrote: > Hi, this is a question about best practices, as we build our AirFlow > instance and establish coding conventions. > > We have a few jobs that follow this pattern: > > - An external API defines a list of items. Calls to this API are > slow, let's say on the order of minutes. > - For each item in this list, we want to launch a sequence of tasks. > > So far reading and playing with AirFlow, we figure this might be a good > approach: > > 1. A separate "Generator" DAG calls the API and generates a config > file with the list of items. > 2. The "Actual" DAG parses at DAG parsing time, reads the config file > and generates a dynamic DAG accordingly. > > Are there other preferred ways to do this kind of thing? Thanks in > advance! > > > Dan Andreescu > Wikimedia Foundation > --0000000000003cf09005c54c72d3 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
The only hurdle to overcome with this approach is getting = the file into every running container (depending on your infra setup).=C2= =A0 E.g. if worker 1 picks up the "update config" task and update= s a config file locally, it would not be accessible in the scheduler contai= ner, or worker 2.=C2=A0=C2=A0

Do you have a network driv= e mounted into every container so that once the config file is updated it i= s then immediately available to all containers?=C2=A0 Or some other solutio= n?

What I have done in this scenario is have the "u= pdate config" dag update an airflow variable.=C2=A0 Then the dynamic d= ag reads from that variable to generate the tasks.=C2=A0 This avoids the fi= le problem I describe above.=C2=A0 It does make a call to the metastore but= in practice that does not seem to be a problem.

A= nother thing I have thought about is generate the config file during deploy= ments and bake it into the image but that requires more setup than the vari= able approach so I did not go that route.

Having o= ne "config update" dag for all such processes like this seems lik= e a pretty good way to go. But for me right now I update the config variabl= e within the dag that uses the config.

On Mon, Jun 21, 2021 at 1= 2:55 PM Dan Andreescu <dandr= eescu@wikimedia.org> wrote:
Hi, this is a question about best = practices, as we build our AirFlow instance and establish coding convention= s.

We have a few jobs that follow this pattern= :
  • An external API defines a list of items.=C2=A0 Calls to= this API are slow, let's say on the order of minutes.
  • For each= item in this list, we want to launch a sequence of tasks.
So= far reading and playing with AirFlow, we figure this might be a good appro= ach:
  1. A separate "Generator" DAG calls the= API and generates a config file with the list of items.
  2. The "= Actual" DAG parses at DAG parsing=C2=A0time, reads the config file and= generates a dynamic DAG accordingly.
Are there other preferr= ed ways to do this kind of thing?=C2=A0 Thanks in advance!
=

Dan Andreescu
Wikimedia Foundation<= /div>
--0000000000003cf09005c54c72d3--