airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: Airflow scheduler/worker inefficient time
Date Fri, 03 Jun 2016 18:31:55 GMT
Note that in general, Airflow isn't designed to run thousands of small
tasks per minute. The celery library on its own does that well without any
oversight from Airflow, though then you miss out on what Airflow has to
provide (complex dependency management, state handling, logging, retries,
...).

Airflow typically assume long running batch processes, in the minutes to
hours range. If you need sub-second or even sub-minute latency between your
tasks, Airflow probably isn't the right choice.

One goal we have for the project is to allow for the scheduler to trigger
jobs roughly every minute and maintain that at scale.

Max

On Fri, Jun 3, 2016 at 10:07 AM, Ryabchuk, Pavlo <
ext-pavlo.ryabchuk@here.com> wrote:

> Hey,
> Had a look at this celery config option, but no luck. Also tried setting
> executor to Local executor - same result
> Each task takes no more than 0.1 sec but overall time is huge
> Thought that it could be due to disabled pickling, enabled it - almost no
> change :(
>
> -----Original Message-----
> From: Bolke de Bruin [mailto:bdbruin@gmail.com]
> Sent: Monday, May 30, 2016 3:09 PM
> To: dev@airflow.incubator.apache.org
> Subject: Re: Airflow scheduler/worker inefficient time
>
> Have a look at this: https://github.com/apache/incubator-airflow/pull/1509
>
>
> Sent from my iPhone
>
> > On 30 mei 2016, at 14:03, Ryabchuk, Pavlo <ext-pavlo.ryabchuk@here.com>
> wrote:
> >
> > Hi all,
> >
> > Maybe I am misusing airflow a bit, because I am using it as on demand
> (triggered) complex data processing system, but still, the question is,
> what are the actual parameters I should play around with in order to
> speedup execution?
> > I have around 250 Dummy tasks (which do nothing) in my DAG and running
> it locally with celery executor takes around 1000 sec, which is pretty
> strange.  I've noticed that a single Dummy task takes some milliseconds.
> I've tried playing around with celery concurrency, airflow executor
> parallelism and heartbeat, but with almost no result... it's really
> strange, what am I doing wrong :)
> >
> > Best,
> > Pavlo
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message