airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Heistermann, Till" <Till.Heisterm...@blue-yonder.com>
Subject Re: Making Airflow Timezone aware
Date Tue, 14 Nov 2017 15:33:04 GMT
Hi Bolke,

This looks great.

We have had the requirement to run DAGs in different local time zones for a while, so far
we worked around the limitation on dag-level to automate most of our DST switches.

How would the approach behave in the DST-Switch corner cases?

For the regular case, I understand that if start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)
 and the  schedule is “30 8 * * *”, the DST switch would work as expected, and the dag
would get scheduled at 7:30 am UTC in European Winter and 6:30 UTC in European Summer.

However, if start_date=datetime(2017, 1, 1, 2, 30, 0, tzinfo=“Europe/Amsterdam”)  and
the schedule is “30 2 * * *”, would we skip a nightly run in March and have two nightly
runs in October?
This seems like the correct thing to do from a time zone logic point of view, although I can
imagine that there are many operational use cases where the user wants something different.

If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the schedule
is timedelta(days=14), would a DST switch actually occur?
There is some ambiguity in this case, depending on the timedelta(days=14) being understood
as either “14 days in local calendar” or 14*24*60*60 seconds on the system clock.
I’m not sure what the expected behaviour should be in this case.

Cheers,
Till


On 13.11.17, 19:47, "Ash Berlin-Taylor" <ash_airflowlist@firemirror.com> wrote:

    This sounds like an awesome change!
    
    I'm happy to review (will take a look tomorrow) but won't be a suitable tester as all
our DAGs operate in UTC.
    
    -ash
    
    
    > On 13 Nov 2017, at 18:09, Bolke de Bruin <bdbruin@gmail.com> wrote:
    > 
    > Hi All,
    > 
    > I just want to make you aware that I am creating patches that make Airflow timezone
aware. The gist of the idea is that Airflow internally will use and store UTC everywhere.
This allows you to have start_date = datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and
Airflow will properly take care of day light savings time. If you are using cron we will make
sure to always run at the exact time (end of interval of course) which you specify even when
DST is in effect, e.g. 8.00am is always 8.00am regardless of if a day lights savings time
has happened. DAGs that don’t have a timezone associated, get a default timezone that is
configurable.
    > 
    > In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there. As the patches
are invasive particularly in tests (everything needs a timezone basically) less so in other
areas I like to raise special attention to a couple of places where this has impact.
    > 
    > 1. All database DateTime fields are converted to timezone aware Timestamp fields.
This impacts MySQL deployments particularly as MySQL was storing DateTime fields, which cannot
be made timezone aware. Also, to make sure conversion happens properly we set the connection
time zone to UTC. This is supported by Postgres and MySQL. However, it is not supported by
SQLServer. So if you are running outside of UTC you need to take special care when upgrading.
    > 
    > 2. Thou shall not use datetime.now() and datetime.utcnow() when writing code for
core (operators, sensors, scheduler etc) Airflow (in DAGs your can still use it). Both create
naive date times (yes even utcnow() ). You can use airflow.utils.timezone utcnow() for this.
As you will not be able to store naive datetime fields anymore you will notice soon enough.
    > 
    > Finally, and that is the main reason fir this email, I am looking for feedback and
testers. The PR can be found here: https://github.com/apache/incubator-airflow/pull/2781 it
doesn’t pass the tests yet, but you can see that I am working hard on that ;-).
    > 
    > Cheers
    > Bolke
    
    

Mime
View raw message