airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dan Davydov <dan.davy...@airbnb.com.INVALID>
Subject Re: When to use pools?
Date Wed, 22 Jun 2016 00:05:56 GMT
Assuming you are using local/sequential executors, the backfill pool would
be used.

On Mon, Jun 20, 2016 at 10:47 PM, harish singh <harish.singh22@gmail.com>
wrote:

> hmm.. Thanks Lance. I mentioned about pool for 'backfill' is because I saw
> that being a part 'default_args' airflow example.
>
> Chris/Dan/Bolke/Jeremiah/Paul/all :) :
> So suppose I create two pools:  'poo1' and 'pool2'
> and use it for tasks t1 and t2. Now say I also create a pool call
> 'backfill' but not use it in any of the tasks inside my DAG.
>
> Whenever I run the backfill for my dag with  ```--pool backfill```,
> will the scheduler use the slots from this backfill pool or will the tasks
> use pool1 and poo2?
>
>
>
> On Mon, Jun 20, 2016 at 9:20 PM, Dan Davydov
> <dan.davydov@airbnb.com.invalid
> > wrote:
>
> > At the moment by default backfill does not use a pool but you can specify
> > one with --pool.
> >
> > On Mon, Jun 20, 2016 at 9:02 PM, Chris Riccomini <criccomini@apache.org>
> > wrote:
> >
> > > Hey Harish,
> > >
> > > One thing that I'm not clear on is whether backfill even honors pools
> at
> > > all. I believe backfill currently starts its own scheduler outside of
> the
> > > main scheduler process. As a result, I think the pools are completely
> > > disregarded. Bolke/Jeremiah/Paul can correct me if I'm wrong.
> > >
> > > Cheers,
> > > Chris
> > >
> > > On Mon, Jun 20, 2016 at 7:46 PM, Lance Norskog <
> lance.norskog@gmail.com>
> > > wrote:
> > >
> > > > One reason to use Pools is because you have tasks in different DAGs
> > that
> > > > all use the same resource, like a database. A Pool lets you say, "I
> > will
> > > > send no more than 3 requests to this database at once". However,
> there
> > > are
> > > > bugs in the scheduler and it is possible to have many active tasks
> > > > overscheduled against a pool.
> > > >
> > > > You can create a pool in the Admin->Pools drop-down. You don't need
a
> > > > script.
> > > >
> > > > On Mon, Jun 20, 2016 at 2:46 PM, harish singh <
> > harish.singh22@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > We have been using airflow for few 3 months now.
> > > > >
> > > > > One pain I felt was, during backfill if I have 2 tasks t1 and t2
-
> > with
> > > > t1
> > > > > having depends_on_past=true,
> > > > >               t0 -> t1
> > > > >               t0 -> t2
> > > > >
> > > > > I find that the task t2 with no past dependency keeps getting
> > > scheduled.
> > > > > This causes the task t1 to wait for a long time before it gets
> > > scheduled.
> > > > >
> > > > > I think this is a good use case for creating "pools" and allocate
> > slots
> > > > for
> > > > > each pool.
> > > > > Also, I will have to use priority_weights.  And adjust
> parallelism!!!
> > > > >
> > > > > Is there a better way to handle this?
> > > > >
> > > > >
> > > > > Also, in general, are there any examples on how to use pools?
> > > > >
> > > > > I peeked into* airflow/tests/operators/subdag_operator.py *and
> found
> > > the
> > > > > below snippet:
> > > > >
> > > > > session = airflow.settings.Session()
> > > > > pool_1 = airflow.models.Pool(pool='test_pool_1', slots=1)
> > > > > session.add(pool_1)
> > > > > session.commit()
> > > > >
> > > > > Why do we need Session instance? Do we need to run the below code
> > > before
> > > > > creating a pool in code (inside my pipeline.py under dags/
> > directory):
> > > > >
> > > > > *pool = (
> > > > >     session.query(Pool)
> > > > >     .filter(Pool.pool == 'AIRFLOW-205')
> > > > >     .first())
> > > > > if not pool:
> > > > >     session.add(Pool(pool='AIRFLOW-205', slots=8))
> > > > >     session.commit()*
> > > > >
> > > > >
> > > > > Also, I saw few places where pool: 'backfill'  is used?
> > > > >
> > > > > Is 'backfill' a special pre-defined pool?
> > > > >
> > > > >
> > > > > If not, how do we create different types of pools based on whether
> it
> > > > > is backfill or not?
> > > > >
> > > > >
> > > > > All this is being done in pipeline.py script under 'dags/'
> directory.
> > > > >
> > > > >
> > > > > Thanks,
> > > > > Harish
> > > > >
> > > >
> > > >
> > > >
> > > > --
> > > > Lance Norskog
> > > > lance.norskog@gmail.com
> > > > Redwood City, CA
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message