airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kevin Yang <yrql...@gmail.com>
Subject Re: Multiple Schedulers - "scheduler_lock"
Date Sat, 02 Mar 2019 01:31:59 GMT
Preventing double-triggering by separating DAG files different schedulers
parse sounds easier and more intuitive. I actually removed one of the
double-triggering prevention logic here
<https://github.com/apache/airflow/pull/4234/files#diff-a7f584b9502a6dd19987db41a8834ff9L127>(expensive)
and
was relying on this lock
<https://github.com/apache/airflow/blob/master/airflow/models/__init__.py#L1233>
to
prevent double-firing and safe-guard our non-idempotent tasks( btw the
insert can be insert overwrite to be idempotent).

Also tho in Airbnb we requeue tasks a lot, we haven't see double-firing
recently.

Cheers,
Kevin Y

On Fri, Mar 1, 2019 at 2:08 PM Maxime Beauchemin <maximebeauchemin@gmail.com>
wrote:

> Forgot to mention: the intention was to use the lock, but I never
> personally got to do the second phase which would consist of skipping the
> DAG if the lock is on, and expire the lock eventually based on a config
> setting.
>
> Max
>
> On Fri, Mar 1, 2019 at 1:57 PM Maxime Beauchemin <
> maximebeauchemin@gmail.com>
> wrote:
>
> > My original intention with the lock was preventing "double-triggering" of
> > task (triggering refers to the scheduler putting the message in the
> queue).
> > Airflow now has good "double-firing-prevention" of tasks (firing happens
> > when the worker receives the message and starts the task), even if the
> > scheduler was to go rogue or restart and send multiple triggers for a
> task
> > instance, the worker(s) should only start one task instance. That's done
> by
> > running the database assertions behind the conditions being met as read
> > database transaction (no task can alter the rows that validate the
> > assertion while it's getting asserted). In practice it's a little tricky
> > and we've seen rogue double-firing in the past (I have no idea how often
> > that happens).
> >
> > If we do want to prevent double-triggerring, we should make sure that 2
> > schedulers aren't processing the same DAG or DagRun at the same time.
> That
> > would mean for the scheduler to not start the process of locked DAGs, and
> > by providing a mechanism to expire the locks after some time.
> >
> > Has anyone experienced double firing lately? If that exist we should fix
> > it, but also be careful around multiple scheduler double-triggering as it
> > would make that problem potentially much worse.
> >
> > Max
> >
> > On Fri, Mar 1, 2019 at 8:19 AM Deng Xiaodong <xd.deng.r@gmail.com>
> wrote:
> >
> >> It’s exactly what my team is doing & what I shared here earlier last
> year
> >> (
> >>
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> >> <
> >>
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> >
> >> )
> >>
> >> It’s somehow a “hacky” solution (and HA is not addressed), and now I’m
> >> thinking how we can have it more proper & robust.
> >>
> >>
> >> XD
> >>
> >> > On 2 Mar 2019, at 12:04 AM, Mario Urquizo <mario.urquizo@gmail.com>
> >> wrote:
> >> >
> >> > We have been running multiple schedulers for about 3 months.  We
> created
> >> > multiple services to run airflow schedulers.  The only difference is
> >> that
> >> > we have each of the schedulers pointed to a directory one level deeper
> >> than
> >> > the DAG home directory that the workers and webapp use. We have seen
> >> much
> >> > better scheduling performance but this does not yet help with HA.
> >> >
> >> > DAGS_HOME:
> >> > {airflow_home}/dags  (webapp & workers)
> >> > {airflow_home}/dags/group-a/ (scheduler1)
> >> > {airflow_home}/dags/group-b/ (scheduler2)
> >> > {airflow_home}/dags/group-etc/ (scheduler3)
> >> >
> >> > Not sure if this helps, just sharing in case it does.
> >> >
> >> > Thank you,
> >> > Mario
> >> >
> >> >
> >> > On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbruin@gmail.com>
> >> wrote:
> >> >
> >> >> I have done quite some work on making it possible to run multiple
> >> >> schedulers at the same time.  At the moment I don’t think there are
> >> real
> >> >> blockers actually to do so. We just don’t actively test it.
> >> >>
> >> >> Database locking is mostly in place (DagRuns and TaskInstances). And
> I
> >> >> think the worst that can happen is that a task is scheduled twice.
> The
> >> task
> >> >> will detect this most of the time and kill one off if concurrent if
> not
> >> >> sequential then I will run again in some occasions. Everyone is
> having
> >> >> idempotent tasks right so no harm done? ;-)
> >> >>
> >> >> Have you encountered issues? Maybe work those out?
> >> >>
> >> >> Cheers
> >> >> Bolke.
> >> >>
> >> >> Verstuurd vanaf mijn iPad
> >> >>
> >> >>> Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.deng.r@gmail.com>
> het
> >> >> volgende geschreven:
> >> >>>
> >> >>> Hi Max,
> >> >>>
> >> >>> Following
> >> >>
> >>
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> >> >> <
> >> >>
> >>
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> >> >,
> >> >> I’m trying to prepare an AIP for supporting multiple-scheduler in
> >> Airflow
> >> >> (mainly for HA and Higher scheduling performance).
> >> >>>
> >> >>> Along the process of code checking, I found that there is one
> >> attribute
> >> >> of DagModel, “scheduler_lock”. It’s not used at all in current
> >> >> implementation, but it was introduced long time back (2015) to allow
> >> >> multiple schedulers to work together (
> >> >>
> >>
> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
> >> >> <
> >> >>
> >>
> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
> >> >
> >> >> ).
> >> >>>
> >> >>> Since you were the original author of it, it would be very helpful
> if
> >> >> you can kindly share why the multiple-schedulers implementation was
> >> removed
> >> >> eventually, and what challenges/complexity there were.
> >> >>> (You already shared a few valuable inputs in the earlier discussion
> >> >>
> >>
> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
> >> >> <
> >> >>
> >>
> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
> >> >
> >> >> , mainly relating to hiccups around concurrency, cross DAG
> >> prioritisation &
> >> >> load on DB. Other than these, anything else you would like to
> advise?)
> >> >>>
> >> >>> I will also dive into the git history further to understand it
> better.
> >> >>>
> >> >>> Thanks.
> >> >>>
> >> >>>
> >> >>> XD
> >> >>
> >>
> >>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message