airflow-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "George Leslie-Waksman (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (AIRFLOW-1930) start_date and execution_date should default to timezone.utcnow() not to func.now()
Date Mon, 08 Jan 2018 20:56:00 GMT

    [ https://issues.apache.org/jira/browse/AIRFLOW-1930?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16317049#comment-16317049
] 

George Leslie-Waksman commented on AIRFLOW-1930:
------------------------------------------------

1. Is a strong point and I empathize with the maintainability issue. I do, however, worry
that the risks of allowing race conditions due to clock skew will result in harder bugs to
track down and fix than maintaining `now` functions for different databases.

2. I don't understand why we need to worry about database configuration. If the column is
timezone aware and the time is written to match the timezone it is written with, won't using
tz aware datetime objects take care of the rest for us? If the DB is in PST and knows it,
things should "just work". If the DB is in PST and thinks it's in EST, I don't see how that
should be Airflow's responsibility to figure out.

3. For me, it's less about added value and more about decreased risk. Although rarely an issue
in most cases, clock skew does happen and we want Airflow to be resilient to it. Time servers
go down, ntp fails, light only travels so fast. Celery will certainly complain but it won't
necessarily do anything to mitigate the problem. This creates a possibility where a scheduler
could be running slow, a worker could be running fast, and we could end up with tasks that
start (and finish) before they are even scheduled (according to the metadata db). Or, similarly,
you could have tasks finish before their dependencies start (again according to the metadata
db).

I would think we want to use a single source of truth for time, if at all possible. So, I'd
say we want to use server time for everything.

In what situations won't `sql_utcnow` work?

> start_date and execution_date should default to timezone.utcnow() not to func.now()
> -----------------------------------------------------------------------------------
>
>                 Key: AIRFLOW-1930
>                 URL: https://issues.apache.org/jira/browse/AIRFLOW-1930
>             Project: Apache Airflow
>          Issue Type: Bug
>    Affects Versions: 1.9.0, 1.8.2
>            Reporter: Bolke de Bruin
>            Assignee: Bolke de Bruin
>             Fix For: 1.9.1
>
>
> func.now() defaults to the time zone of the database, while we assume every date in the
db is UTC. 



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Mime
View raw message