airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From George Leslie-Waksman <geo...@cloverhealth.com.INVALID>
Subject Re: Making Airflow Timezone aware
Date Thu, 16 Nov 2017 00:33:59 GMT
Really happy to hear this moving forward. Thanks Bolke!

On Tue, Nov 14, 2017 at 7:44 AM Bolke de Bruin <bdbruin@gmail.com> wrote:

> See inline answers below.
>
> Verstuurd vanaf mijn iPad
>
> > Op 14 nov. 2017 om 16:33 heeft Heistermann, Till <
> Till.Heistermann@blue-yonder.com> het volgende geschreven:
> >
> > Hi Bolke,
> >
> > This looks great.
> >
> > We have had the requirement to run DAGs in different local time zones
> for a while, so far we worked around the limitation on dag-level to
> automate most of our DST switches.
> >
> > How would the approach behave in the DST-Switch corner cases?
> >
> > For the regular case, I understand that if start_date=datetime(2017, 1,
> 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)  and the  schedule is “30 8 * * *”,
> the DST switch would work as expected, and the dag would get scheduled at
> 7:30 am UTC in European Winter and 6:30 UTC in European Summer.
>
> Actually no. For cron defined schedules we will always use local time, but
> naive. This means your 8.30 schedule will always happen 8.30 local time
> regardless.
>
> >
> > However, if start_date=datetime(2017, 1, 1, 2, 30, 0,
> tzinfo=“Europe/Amsterdam”)  and the schedule is “30 2 * * *”, would we skip
> a nightly run in March and have two nightly runs in October?
> > This seems like the correct thing to do from a time zone logic point of
> view, although I can imagine that there are many operational use cases
> where the user wants something different.
>
> I have to verify what happens. I think what will happen is that it will
> run at 3.30 as we convert to naive local time (dst unaware) add the
> interval convert back to UTC. UTC will then translate to 3.30 local time
> which is btw equal to 2.30 local time.
>
> Execution_date will be in UTC. The DAG will store time zone information so
> you can decide yourself what you want to do with that.
>
>
> >
> > If start_date=datetime(2017, 1, 1, 8, 30, 0, tzinfo=“Europe/Amsterdam”)
> and the schedule is timedelta(days=14), would a DST switch actually occur?
> > There is some ambiguity in this case, depending on the
> timedelta(days=14) being understood as either “14 days in local calendar”
> or 14*24*60*60 seconds on the system clock.
> > I’m not sure what the expected behaviour should be in this case.
>
> For timedeltas DST is in effect. It is assumed here that you want to run X
> hours later, not at a specific time. Obviously if you want to keep the old
> behavior (and this is the default) keep your Timezone at Utc.
>
> >
> > Cheers,
> > Till
> >
> >
> > On 13.11.17, 19:47, "Ash Berlin-Taylor" <ash_airflowlist@firemirror.com>
> wrote:
> >
> >    This sounds like an awesome change!
> >
> >    I'm happy to review (will take a look tomorrow) but won't be a
> suitable tester as all our DAGs operate in UTC.
> >
> >    -ash
> >
> >
> >> On 13 Nov 2017, at 18:09, Bolke de Bruin <bdbruin@gmail.com> wrote:
> >>
> >> Hi All,
> >>
> >> I just want to make you aware that I am creating patches that make
> Airflow timezone aware. The gist of the idea is that Airflow internally
> will use and store UTC everywhere. This allows you to have start_date =
> datetime(2017, 1, 1, tzinfo=“Europe/Amsterdam”) and Airflow will properly
> take care of day light savings time. If you are using cron we will make
> sure to always run at the exact time (end of interval of course) which you
> specify even when DST is in effect, e.g. 8.00am is always 8.00am regardless
> of if a day lights savings time has happened. DAGs that don’t have a
> timezone associated, get a default timezone that is configurable.
> >>
> >> In AIRFLOW-288 I am tracking what needs to be done, but I am 80% there.
> As the patches are invasive particularly in tests (everything needs a
> timezone basically) less so in other areas I like to raise special
> attention to a couple of places where this has impact.
> >>
> >> 1. All database DateTime fields are converted to timezone aware
> Timestamp fields. This impacts MySQL deployments particularly as MySQL was
> storing DateTime fields, which cannot be made timezone aware. Also, to make
> sure conversion happens properly we set the connection time zone to UTC.
> This is supported by Postgres and MySQL. However, it is not supported by
> SQLServer. So if you are running outside of UTC you need to take special
> care when upgrading.
> >>
> >> 2. Thou shall not use datetime.now() and datetime.utcnow() when writing
> code for core (operators, sensors, scheduler etc) Airflow (in DAGs your can
> still use it). Both create naive date times (yes even utcnow() ). You can
> use airflow.utils.timezone utcnow() for this. As you will not be able to
> store naive datetime fields anymore you will notice soon enough.
> >>
> >> Finally, and that is the main reason fir this email, I am looking for
> feedback and testers. The PR can be found here:
> https://github.com/apache/incubator-airflow/pull/2781 it doesn’t pass the
> tests yet, but you can see that I am working hard on that ;-).
> >>
> >> Cheers
> >> Bolke
> >
> >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message