airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Guziel <alex.guz...@airbnb.com.INVALID>
Subject Re: Sensor slots utilization
Date Fri, 28 Jul 2017 18:27:53 GMT
I'm concerned that we would be making the logic more complex, unless the
new sensor 'pokeonce' case is just a high number of retries. And the other
overhead of course.
Running the poke method inline wouldn't be great for perf either since it's
a blocking I/O and would need to be handled async in order to not slow down
scheduling.

FWIW, our current setup at Airbnb has a separate queue for sensors with a
high number of slots per worker.

On Fri, Jul 28, 2017 at 11:14 AM, Maxime Beauchemin <
maximebeauchemin@gmail.com> wrote:

> Thought his was interesting to bubble up to the mailing list. From:
> https://github.com/apache/incubator-airflow/pull/2423#
> issuecomment-318723842
>
> This is about the issue around sensors utilizing a lot of worker slots. The
> context is a PR from @shaform introducing sensors that check once and give
> up their slot and get reschedule for each sensing operation (as opposed to
> the current behavior of sleeping and poking while constantly using the slot
> until the criteria is met or timeout is reached)
>
> ---------------
>
> *So this is legitimate, but shifts some of the burden of slot utilization
> towards other costs like task startups costs and more communication
> overhead. These costs may be preferable depending on the
> scenario/environment. Starting a task can have significant overhead
> depending on the size of the DAG and other factors that depend on the
> executor. Say for the upcoming Kubernetes executor, startup may include
> booting up a docker instance and doing a shallow clone of the repo.*
>
> *Since this is a major change, I would argue that we shouldn't change the
> current default since organizations have provisioned and stabilized their
> environments based on the current behavior. Default behavior could be
> changed when moving to 2.0, which isn't really planned or scheduled at the
> moment.*
>
> *Another idea around reducing the overall sensor slot utilization would be
> to move that burden towards the scheduler (let's call it the supervisor now
> since it does more than just scheduling at this point). My idea there was
> to add a flag to BaseSensorOperator that would tell the scheduler to run
> the poke method in line with the scheduling instead of using the executor.
> In that scenario, there's no startup cost and no communication overhead.
> The downside is that it can slow down the scheduler. This would be a great
> option where sensing is cheap and fast*
>
> *That gives us potentially 3 sensor_modes, which I would argue should be
> implemented as a BaseOperator argument. Derivative classes can decide to
> expose the argument or force it. Administrator could also use
> the policy function to force certain sensing mode in certain or all
> contexts in their environment.*
>
> Max
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message