From dev-return-5125-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Wed May 16 04:57:16 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 414AA180634 for ; Wed, 16 May 2018 04:57:16 +0200 (CEST) Received: (qmail 77314 invoked by uid 500); 16 May 2018 02:57:14 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 77299 invoked by uid 99); 16 May 2018 02:57:13 -0000 Received: from ui-eu-01.ponee.io (HELO localhost) (176.9.59.70) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 16 May 2018 02:57:13 +0000 References: X-Mailer: LuaSocket 3.0-rc1 From: alireza.khoshkbari@gmail.com Subject: Re: How Airflow import modules as it executes the tasks x-ponymail-agent: PonyMail Composer/0.3 x-ponymail-sender: 31cab08c63bc640b854f310b4acfb668c6a920ee Date: Wed, 16 May 2018 02:57:12 -0000 To: Message-ID: MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Thanks Kevin. Yes, I'm importing db in different operators. That said, my understanding is if a module has already been imported, it's not loaded again even if you try to import it again (and I reckon this is why in Python Singleton is not commonly used). Is that right? On 2018/05/16 02:34:18, Ruiqin Yang wrote: > Not exactly answering your question but the reason db.py is loaded in each > task might be because you have something like `import db` in each of your > *.py file, and Airflow spun up one process to parse one *.py file, thus > your db.py was loaded multiple time. > > I'm not sure how you can share the connection pool if it is created within > the same process your operator is in, since Airflow would spun up one > process for each task even it is LocalExecutor. You might have to make the > connection pool available to outside processes (this part Idk how it can be > done) to be able to share it. > > Cheers, > Kevin Y > > On Tue, May 15, 2018 at 6:21 PM, alireza.khoshkbari@gmail.com < > alireza.khoshkbari@gmail.com> wrote: > > > To start off, here is my project structure: > > ├── dags > > │ ├── __init__.py > > │ ├── core > > │ │ ├── __init__.py > > │ │ ├── operators > > │ │ │ ├── __init__.py > > │ │ │ ├── first_operator.py > > │ │ └── util > > │ │ ├── __init__.py > > │ │ ├── db.py > > │ ├── my_dag.py > > > > Here is the versions and details of the airflow docker setup: > > > > In my dag in different tasks I'm connecting to db (not Airflow db). I've > > setup db connection pooling, I expected that my db.py would be be loaded > > once across the DagRun. However, in the log I can see that each task > > imports the module and new db connections made by each and every task. I > > can see that db.py is loaded in each task by having the line below in db.py: > > > > logging.info("I was loaded {}".format(random.randint(0,100))) > > > > I understand that each operator can technically be run in a separate > > machine and it does make sense that each task runs sort of independently. > > However, not sure that if this does apply in case of using LocalExecutor. > > Now the question is, how I can share the resources (db connections) across > > tasks using LocalExecutor. > > >