airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jeremiah Lowin <>
Subject Re: Celery or Dask?
Date Mon, 13 Feb 2017 17:26:15 GMT
As far as I know I'm the only person using Dask with Airflow at the moment.
I've been using Dask for a variety of other (non-Airflow) tasks and have
found it to be a great tool. However, it's important to note that Celery is
a much more mature project with finer control over how tasks are executed.
In fact Dask's objectives are totally different (I think of it as
"pure-Python Spark") but it happens to expose similar functionality to
Celery through its Distributed subproject.

I added a DaskExecutor to Airflow in my last commit and am working on
improving the unit tests now. I've been running the DaskExecutor in a test
environment with good results, but between the fact that you have to run
Airflow's bleeding-edge master branch to get it and that I'm the only
person kicking its tires (at the moment), I would only recommend using it
if you like to live very dangerously indeed.

In the near future, I can see Dask being a recommended way to scale Airflow
beyond a single machine due to the ease of setting it up -- but not yet.

On Mon, Feb 13, 2017 at 11:04 AM Bolke de Bruin <> wrote:

Dask just landed in master. So no Celery is the most used option to

Always interested in what you are running into, but please be prepared to
provide a lot of info on your setup.

- Boke

> On 13 Feb 2017, at 17:01, EKC (Erik Cederstrand) <>
> Hello all,
> I'm investigating why some of our DAGs are not being scheduled properly (
ran into, among other
things). Coupled with comments on this list, I'm getting the impression
that Celery is a second-class citizen and core developers are mainly using
Dask. Is this correct?
> If Dask support is simply more mature and more likely to have issues
responded to, I'll consider migrating our installation.
> Thanks,
> Erik

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message