airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <ash_airflowl...@firemirror.com>
Subject Re: Using large numbers of sensors, resource consumption
Date Tue, 10 Jul 2018 14:22:34 GMT
We are also using the "high number of retries" pattern rather than sensors (S3KeySensor in
our case) for similar reason - we have data that arrives for a week "some point after Thursday
midnight" -- but that can take 5 or even 8 days for it to arrive. Yay third parties.

It would be nice to have a different kind of sensor (or a flag to the existing ones) so that
rather than sitting in a busy loop on an executor they just go back and re-schedule themselves.
We've just not gotten around to writing that (we being where I work).

-ash

> On 10 Jul 2018, at 15:05, Pedro Machado <pedro@205datalab.com> wrote:
> 
> I have a few DAGs that use time sensors to wait until data is ready, which
> can be several days.
> 
> I have one daily DAG where, for each execution date, I have to repull the
> data for the next 7 days to capture changes (late arriving revenue data).
> This DAG currently starts 7 TimeDeltaSensors for each execution days with
> delays that range from 0 to 6 days.
> 
> I was wondering what the recommendation is for cases like this where a
> large number of sensors is needed.
> 
> Are there ways to reduce the footprint of these sensors so that they use
> less CPU and memory?
> 
> I noticed that in one of the DAGs that Germain Tanguy had in the
> presentation he shared today a sensor was set to time out every 30 seconds
> but had a large retry count so instead of running constantly, it runs every
> 15 minutes for 30 seconds and then dies.
> 
> Are other people using this pattern? Do you have other suggestions?
> 
> Thanks,
> 
> Pedro


Mime
View raw message