airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mario Urquizo <mario.urqu...@gmail.com>
Subject Re: Multiple Schedulers - "scheduler_lock"
Date Fri, 01 Mar 2019 16:04:43 GMT
We have been running multiple schedulers for about 3 months.  We created
multiple services to run airflow schedulers.  The only difference is that
we have each of the schedulers pointed to a directory one level deeper than
the DAG home directory that the workers and webapp use. We have seen much
better scheduling performance but this does not yet help with HA.

DAGS_HOME:
{airflow_home}/dags  (webapp & workers)
{airflow_home}/dags/group-a/ (scheduler1)
{airflow_home}/dags/group-b/ (scheduler2)
{airflow_home}/dags/group-etc/ (scheduler3)

Not sure if this helps, just sharing in case it does.

Thank you,
Mario


On Fri, Mar 1, 2019 at 9:44 AM Bolke de Bruin <bdbruin@gmail.com> wrote:

> I have done quite some work on making it possible to run multiple
> schedulers at the same time.  At the moment I don’t think there are real
> blockers actually to do so. We just don’t actively test it.
>
> Database locking is mostly in place (DagRuns and TaskInstances). And I
> think the worst that can happen is that a task is scheduled twice. The task
> will detect this most of the time and kill one off if concurrent if not
> sequential then I will run again in some occasions. Everyone is having
> idempotent tasks right so no harm done? ;-)
>
> Have you encountered issues? Maybe work those out?
>
> Cheers
> Bolke.
>
> Verstuurd vanaf mijn iPad
>
> > Op 1 mrt. 2019 om 16:25 heeft Deng Xiaodong <xd.deng.r@gmail.com> het
> volgende geschreven:
> >
> > Hi Max,
> >
> > Following
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E
> <
> https://lists.apache.org/thread.html/0e21230e08f07ef6f8e3c59887e9005447d6932639d3ce16a103078f@%3Cdev.airflow.apache.org%3E>,
> I’m trying to prepare an AIP for supporting multiple-scheduler in Airflow
> (mainly for HA and Higher scheduling performance).
> >
> > Along the process of code checking, I found that there is one attribute
> of DagModel, “scheduler_lock”. It’s not used at all in current
> implementation, but it was introduced long time back (2015) to allow
> multiple schedulers to work together (
> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620
> <
> https://github.com/apache/airflow/commit/2070bfc50b5aa038301519ef7c630f2fcb569620>
> ).
> >
> > Since you were the original author of it, it would be very helpful if
> you can kindly share why the multiple-schedulers implementation was removed
> eventually, and what challenges/complexity there were.
> > (You already shared a few valuable inputs in the earlier discussion
> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E
> <
> https://lists.apache.org/thread.html/d37befd6f04dbdbfd2a2d41722352603bc2e2f97fb47bdc5ba454d0c@%3Cdev.airflow.apache.org%3E>
> , mainly relating to hiccups around concurrency, cross DAG prioritisation &
> load on DB. Other than these, anything else you would like to advise?)
> >
> > I will also dive into the git history further to understand it better.
> >
> > Thanks.
> >
> >
> > XD
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message