airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tao Feng <fengta...@gmail.com>
Subject Re: Multiple Schedulers - "scheduler_lock"
Date Sat, 02 Mar 2019 07:39:23 GMT
Does the proposal use master-slave architecture(leader scheduler vs slave
scheduler)?

On Fri, Mar 1, 2019 at 5:32 PM Kevin Yang <yrqls21@gmail.com> wrote:

> Preventing double-triggering by separating DAG files different schedulers
> parse sounds easier and more intuitive. I actually removed one of the
> double-triggering prevention logic here
> <
> https://github.com/apache/airflow/pull/4234/files#diff-a7f584b9502a6dd19987db41a8834ff9L127
> >(expensive)
> and
> was relying on this lock
> <
> https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L1233
> >
> to
> prevent double-firing and safe-guard our non-idempotent tasks( btw the
> insert can be insert overwrite to be idempotent).
>
> Also tho in Airbnb we requeue tasks a lot, we haven't see double-firing
> recently.
>
> Cheers,
> Kevin Y
>
> On Fri, Mar 1, 2019 at 2:08 PM Maxime Beauchemin <
> maximebeauchemin@gmail.com>
> wrote:
>
> > Forgot to mention: the intention was to use the lock, but I never
> > personally got to do the second phase which would consist of skipping the
> > DAG if the lock is on, and expire the lock eventually based on a config
> > setting.
> >
> > Max
> >
> > On Fri, Mar 1, 2019 at 1:57 PM Maxime Beauchemin <
> > maximebeauchemin@gmail.com>
> > wrote:
> >
> > > My original intention with the lock was preventing "double-triggering"
> of
> > > task (triggering refers to the scheduler putting the message in the
> > queue).
> > > Airflow now has good "double-firing-prevention" of tasks (firing
> happens
> > > when the worker receives the message and starts the task), even if the
> > > scheduler was to go rogue or restart and send multiple triggers for a
> > task
> > > instance, the worker(s) should only start one task instance. That's
> done
> > by
> > > running the database assertions behind the conditions being met as read
> > > database transaction (no task can alter the rows that validate the
> > > assertion while it's getting asserted). In practice it's a little
> tricky
> > > and we've seen rogue double-firing in the past (I have no idea how
> often
> > > that happens).
> > >
> > > If we do want to prevent double-triggerring, we should make sure that 2
> > > schedulers aren't processing the same DAG or DagRun at the same time.
> > That
> > > would mean for the scheduler to not start the process of locked DAGs,
> and
> > > by providing a mechanism to expire the locks after some time.
> > >
> > > Has anyone experienced double firing lately? If that exist we should
> fix
> > > it, but also be careful around multiple scheduler double-triggering as
> it
> > > would make that problem potentially much worse.
> > >
> > > Max
> > >
> > > On Fri, Mar 1, 2019 at 8:19 AM Deng Xiaodong <xd.deng.r@gmail.com>
> > wrote:
> > >
> > >> It’s exactly what my team is doing & what I shared here earlier last
> > year
> > >> (
> > >>
> >
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> > >> <
> > >>
> >
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> > >
> > >> )
> > >>
> > >> It’s somehow a “hacky” solution (and HA is not addressed), and now
I’m
> > >> thinking how we can have it more proper & robust.
> > >>
> > >>
> > >> XD
> > >>
> > >> > On 2 Mar 2019, at 12:04 AM, Mario Urquizo <mario.urquizo@gmail.com>
> > >> wrote:
> > >> >
> > >> > We have been running multiple schedulers for about 3 months.  We
> > created
> > >> > multiple services to run airflow schedulers.  The only difference
is
> > >> that
> > >> > we have each of the schedulers pointed to a directory one level
> deeper
> > >> than
> > >> > the DAG home directory that the workers and webapp use. We have seen
> > >> much
> > >> > better scheduling performance but this does not yet help with HA.
> > >> >
> > >> > DAGS_HOME:
> > >> > {airflow_home}/dags  (webapp & workers)
> > >> > {airflow_home}/dags/group-a/ (scheduler1)
> > >> > {airflow_home}/dags/group-b/ (scheduler2)
> > >> > {airflow_home}/dags/group-etc/ (scheduler3)
> > >> >
> > >> > Not sure if this helps, just sharing in case it does.
> > >> >
> > >> > Thank you,
> > >> > Mario
> > >> >
> > >> >
> > >> > On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbruin@gmail.com>
> > >> wrote:
> > >> >
> > >> >> I have done quite some work on making it possible to run multiple
> > >> >> schedulers at the same time.  At the moment I don’t think there
are
> > >> real
> > >> >> blockers actually to do so. We just don’t actively test it.
> > >> >>
> > >> >> Database locking is mostly in place (DagRuns and TaskInstances).
> And
> > I
> > >> >> think the worst that can happen is that a task is scheduled twice.
> > The
> > >> task
> > >> >> will detect this most of the time and kill one off if concurrent
if
> > not
> > >> >> sequential then I will run again in some occasions. Everyone is
> > having
> > >> >> idempotent tasks right so no harm done? ;-)
> > >> >>
> > >> >> Have you encountered issues? Maybe work those out?
> > >> >>
> > >> >> Cheers
> > >> >> Bolke.
> > >> >>
> > >> >> Verstuurd vanaf mijn iPad
> > >> >>
> > >> >>> Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.deng.r@gmail.com>
> > het
> > >> >> volgende geschreven:
> > >> >>>
> > >> >>> Hi Max,
> > >> >>>
> > >> >>> Following
> > >> >>
> > >>
> >
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> > >> >> <
> > >> >>
> > >>
> >
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> > >> >,
> > >> >> I’m trying to prepare an AIP for supporting multiple-scheduler
in
> > >> Airflow
> > >> >> (mainly for HA and Higher scheduling performance).
> > >> >>>
> > >> >>> Along the process of code checking, I found that there is
one
> > >> attribute
> > >> >> of DagModel, “scheduler_lock”. It’s not used at all in current
> > >> >> implementation, but it was introduced long time back (2015) to
> allow
> > >> >> multiple schedulers to work together (
> > >> >>
> > >>
> >
> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
> > >> >> <
> > >> >>
> > >>
> >
> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
> > >> >
> > >> >> ).
> > >> >>>
> > >> >>> Since you were the original author of it, it would be very
helpful
> > if
> > >> >> you can kindly share why the multiple-schedulers implementation
was
> > >> removed
> > >> >> eventually, and what challenges/complexity there were.
> > >> >>> (You already shared a few valuable inputs in the earlier
> discussion
> > >> >>
> > >>
> >
> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
> > >> >> <
> > >> >>
> > >>
> >
> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
> > >> >
> > >> >> , mainly relating to hiccups around concurrency, cross DAG
> > >> prioritisation &
> > >> >> load on DB. Other than these, anything else you would like to
> > advise?)
> > >> >>>
> > >> >>> I will also dive into the git history further to understand
it
> > better.
> > >> >>>
> > >> >>> Thanks.
> > >> >>>
> > >> >>>
> > >> >>> XD
> > >> >>
> > >>
> > >>
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message