airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: Concurrent schedulers
Date Mon, 22 May 2017 07:27:33 GMT
Hi Stephen,

We are currently stress testing Airflow for use in a multi-master setup. One of my team members
is doing a write up that should show up online shortly. TL;DR; in its current state Airflow
will need some patches in order to run concurrently. One issue is that Airflow can have a
database deadlock which will stop the scheduler from running. I have a patch for that out
here (https://github.com/apache/incubator-airflow/pull/2267 <https://github.com/apache/incubator-airflow/pull/2267>)
that works fine on Postgres/MySql (tests don’t pass on sqlite yet due to limitations of
sqlite). 

Your global scheduler lock (eg. by an active passive configuration) might make most sense
for now.

Bolke

> On 22 May 2017, at 07:52, Stephen Rigney <sjrigney@gmail.com> wrote:
> 
> Hi,
> 
> We're running airflow in production, but for reliability (n.b. not
> performance) we'd like to confirm if it is safe to spawn multiple instances
> of the scheduler overlapping in time (otherwise we may need to put more
> effort into assuring two copies aren't ever spawned at once in our
> environment).
> 
> 
> It seems this officially wasn't a supported configuration back in 2015 (
> https://groups.google.com/d/msg/airbnb_airflow/-1wKa3OcwME/uATa8y3YDAAJ ),
> but has sufficient intra-airflow locking been added that it is now safe to
> start up two temporally overlapping instances of the scheduler for the same
> airflow system?
> 
> 
> Or should we hack in a "global scheduler lock" - we're not looking for
> increased performance by scheduler parallelism, just that if we ever fire
> up two instances of the scheduler nothing terrible happens?
> 
> 
> Stephen


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message