airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Concurrent schedulers
Date Tue, 23 May 2017 06:39:35 GMT
Hi Max,

We seem to be in quite good order already. We are testing with multi master mysql and will
also test multi master Postgres. As we are doing dagrun level locking already it does not
seem to be required to do DAG-level locking. Also tasks are being locked so if multiple schedulers
are running everything seems to be quite fine. If one of the schedulers restarts it starts
checking for orphaned tasks by checking the executor queue which is unique for every scheduler.
This will result it some tasks being dequeued and then requeued. So airflow is robust enough
to stay alive then (with my patch for deadlocks applied), but some things are a bit sub-optimal.

As mentioned we are still stress testing this setup and we might find more.

Bolke

> On 22 May 2017, at 18:19, Maxime Beauchemin <maximebeauchemin@gmail.com> wrote:
> 
> Things that might be needed for a correct multi-schedulers setup:
> * DAG-level lock while being evaluated
> * DAG-level lock expiration to recover from potential situation where the
> lock wasn't released
> * Accumulation of the list of task instances to run into the database (as
> opposed to cross process communication to master process)
> * Define a clear master cycle that would read the list of accumulated task
> instances from the DB, dedup, prioritize and schedule. That master cycle
> should have a lock (and lock expiration) as well.
> 
> Max
> 
> On Mon, May 22, 2017 at 12:27 AM, Bolke de Bruin <bdbruin@gmail.com> wrote:
> 
>> Hi Stephen,
>> 
>> We are currently stress testing Airflow for use in a multi-master setup.
>> One of my team members is doing a write up that should show up online
>> shortly. TL;DR; in its current state Airflow will need some patches in
>> order to run concurrently. One issue is that Airflow can have a database
>> deadlock which will stop the scheduler from running. I have a patch for
>> that out here (https://github.com/apache/incubator-airflow/pull/2267 <
>> https://github.com/apache/incubator-airflow/pull/2267>) that works fine
>> on Postgres/MySql (tests don’t pass on sqlite yet due to limitations of
>> sqlite).
>> 
>> Your global scheduler lock (eg. by an active passive configuration) might
>> make most sense for now.
>> 
>> Bolke
>> 
>>> On 22 May 2017, at 07:52, Stephen Rigney <sjrigney@gmail.com> wrote:
>>> 
>>> Hi,
>>> 
>>> We're running airflow in production, but for reliability (n.b. not
>>> performance) we'd like to confirm if it is safe to spawn multiple
>> instances
>>> of the scheduler overlapping in time (otherwise we may need to put more
>>> effort into assuring two copies aren't ever spawned at once in our
>>> environment).
>>> 
>>> 
>>> It seems this officially wasn't a supported configuration back in 2015 (
>>> https://groups.google.com/d/msg/airbnb_airflow/-1wKa3OcwME/uATa8y3YDAAJ
>> ),
>>> but has sufficient intra-airflow locking been added that it is now safe
>> to
>>> start up two temporally overlapping instances of the scheduler for the
>> same
>>> airflow system?
>>> 
>>> 
>>> Or should we hack in a "global scheduler lock" - we're not looking for
>>> increased performance by scheduler parallelism, just that if we ever fire
>>> up two instances of the scheduler nothing terrible happens?
>>> 
>>> 
>>> Stephen
>> 
>> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message