airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrian Bridgett <adr...@opensignal.com>
Subject ExternalTaskSensor offsets with monthly jobs
Date Wed, 29 Jun 2016 13:55:33 GMT
I'm hitting a bit of an annoying problem and wondering about the best 
course of action.

We have several dags:
- a daily ETL job
- several reporting jobs (daily, weekly or monthly) which use the data 
from previous ETL jobs

I wish to have a dependency such that the reporting jobs depend upon the 
last ETL job that the report uses.   We're happy to set depends_on_past 
in the ETL job.

Daily jobs are easy - ExternalTaskSensor, job done.
Weekly jobs are a little trickier - we need to work out the 
execution_delta - normally +6 for us (we deliberately run a day late to 
prioritise other jobs).
Monthly jobs.... this is where I'm struggling - how to work out the 
execution_delta.   I guess the ideal would be an upgrade from timedelta 
to dateutil.relativedelta?   tomorrow_ds and ds_add don't help either.

I must admit, ds being the time that's just gone has caused me no end of 
brain befudledness, especially when trying to get the initial job right 
(so much so that I wrote this up in our DAG README, posting here for 
others):

When adding a new job, it's critical to ensure that you've set the 
schedule correctly:
- frequency (monthly, weekly, daily)
- schedule_interval ("0 0 2 * *", "0 0 * * 0", "0 0 * * *")
- start_date (choose a day that matches schedule_interval at least one 
interval ago)
-- e.g if today is Thursday 2016-06-09, go back in time to when the 
schedule will trigger,
    then work out what "ds" (execution date) would be (remembering 
that's the lapsed date)
--- for a monthly job, last trigger=2016-06-02, ds=2016-05-02
--- for a weekly job, last trigger=2016-06-05, ds=2016-05-29
--- for a daily job, last trigger=2016-06-09, ds=2016-06-08

Mime
View raw message