airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ash Berlin-Taylor <ash_airflowl...@firemirror.com>
Subject Re: [VOTE] Airflow 1.10.0rc3
Date Sun, 05 Aug 2018 21:08:32 GMT
Yup, just worked out the same thing.

I think as "punishment" for me finding bugs so late in two RCs (this, and 1.9) I should run
the release for the next release.

-ash

> On 5 Aug 2018, at 22:05, Bolke de Bruin <bdbruin@gmail.com> wrote:
> 
> Yeah I figured it out. Originally i was using a different implementation of UTCDateTime,
but that was unmaintained. I switched, but this version changed or has a different contract.
While it transforms on storing to UTC it does not so when it receives timezone aware fields
from the db. Hence the issue.
> 
> I will prepare a PR that removes the dependency and implements our own extension of DateTime.
Probably tomorrow.
> 
> Good catch! Just in time :-(.
> 
> B.
> 
>> On 5 Aug 2018, at 22:43, Ash Berlin-Taylor <ash_airflowlist@firemirror.com>
wrote:
>> 
>> Entirely possible, though I wasn't even dealing with the scheduler - the issue I
was addressing was entirely in the webserver for a pre-existing Task Instance.
>> 
>> Ah, I hadn't noticed/twigged we are using sqlalchemy-utc. It appears that isn't working
right/ as expected. This line: https://github.com/spoqa/sqlalchemy-utc/blob/master/sqlalchemy_utc/sqltypes.py#L34
doens't look right for us - as you mentioned the TZ is set to something (rather than having
no TZ value).
>> 
>> Some background on how Pq handles TZs. It always returns DTs in the TZ of the connection.
I'm not sure if this is unique to postgres or if other DBs behave the same.
>> 
>> postgres=# select '2018-08-03 00:00:00+00:00'::timestamp with time zone;
>>     timestamptz
>> ------------------------
>> 2018-08-03 01:00:00+01
>> 
>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>     timestamptz
>> ------------------------
>> 2018-08-03 01:00:00+01
>> 
>> The server will always return TZs in the connection timezone.
>> 
>> postgres=# set timezone=utc;
>> SET
>> postgres=# select '2018-08-03 02:00:00+02'::timestamp with time zone;
>>     timestamptz
>> ------------------------
>> 2018-08-03 00:00:00+00
>> (1 row)
>> 
>> postgres=# select '2018-08-03 01:00:00+01'::timestamp with time zone;
>>     timestamptz
>> ------------------------
>> 2018-08-03 00:00:00+00
>> (1 row)
>> 
>> 
>> 
>> 
>> -ash
>> 
>>> On 5 Aug 2018, at 21:28, Bolke de Bruin <bdbruin@gmail.com> wrote:
>>> 
>>> This is the issue:
>>> 
>>> [2018-08-05 22:08:21,952] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-03 00:00:00+00:00
tzinfo: <Timezone [UTC]>
>>> [2018-08-05 22:08:22,007] {jobs.py:1425} INFO - Created <DagRun example_http_operator
@ 2018-08-03 02:00:00+02:00: scheduled__2018-08-03T00:00:00+00:00, externally triggered: False>
>>> 
>>> [2018-08-05 22:08:24,651] {jobs.py:906} INFO - NEXT RUN DATE: 2018-08-04 02:00:00+02:00
tzinfo: psycopg2.tz.FixedOffsetTimezone(offset=120, name=None)
>>> [2018-08-05 22:08:24,696] {jobs.py:1425} INFO - Created <DagRun example_http_operator
@ 2018-08-04 02:00:00+02:00: scheduled__2018-08-04T02:00:00+02:00, externally triggered: False>
>>> 
>>> Notice at line 1+2: that the next run date is correctly in UTC but from the DB
it gets a +2. At the next bit (3+4) we get a psycopg2.tz.FixedOffsetTimezone which should
be set to UTC according to the specs of https://github.com/spoqa/sqlalchemy-utc <https://github.com/spoqa/sqlalchemy-utc>
, but it isn’t. 
>>> 
>>> So changing your setting of the DB to UTC fixes the symptom but not the cause.
>>> 
>>> B.
>>> 
>>>> On 5 Aug 2018, at 22:03, Ash Berlin-Taylor <ash_airflowlist@firemirror.com>
wrote:
>>>> 
>>>> Sorry for being terse before.
>>>> 
>>>> So the issue is that the ts loaded from the DB is not in UTC, it's in GB/+01
(the default of the DB server)
>>>> 
>>>> For me, on a currently running 1.9 (no TZ) db:
>>>> 
>>>> airflow=# select * from task_instance;
>>>> get_op            | example_http_operator | 2018-07-23 00:00:00
>>>> 
>>>> This date time appears in the log url, and the path it looks at on S3 is

>>>> 
>>>> .../example_http_operator/2018-07-23T00:00:00/1.log
>>>> 
>>>> If my postgres server has a default timezone of GB (which the one running
on my laptop does), and I then apply the migration then it is converted to that local time.
>>>> 
>>>> airflow=# select * from task_instance;
>>>> get_op            | example_http_operator | 2018-07-23 01:00:00+01
>>>> 
>>>> airflow=# set timezone=UTC;
>>>> airflow=# select * from task_instance;
>>>> get_op            | example_http_operator | 2018-07-23 00:00:00+00
>>>> 
>>>> 
>>>> This is all okay so far. The migration has kept the column at the same moment
in time.
>>>> 
>>>> The issue come when the UI tries to display logs for this old task: because
the timezone of the connection is not UTC, PG returns a date with a +01 TZ. Thus after the
migration this old task tries to look for a log file of
>>>> 
>>>> .../example_http_operator/2018-07-23T01:00:00/1.log
>>>> 
>>>> which doesn't exist - it's changed the time it has rendered from midnight
(in v1.9) to 1am (in v1.10).
>>>> 
>>>> (This is with my change to log_filename_template from UPDATING.md in my other
branch)
>>>> 
>>>> Setting the timezone to UTC per connection means the behaviour of Airflow
doesn't change depending on how the server is configured.
>>>> 
>>>> -ash
>>>> 
>>>>> On 5 Aug 2018, at 20:58, Bolke de Bruin <bdbruin@gmail.com> wrote:
>>>>> 
>>>>> Digging in a bit further. 
>>>>> 
>>>>> {{{{ ti.dag_id }}}}/{{{{ ti.task_id }}}}/{{{{ ts }}}}/{{{{ try_number
}}}}.log
>>>>> 
>>>>> is the format
>>>>> 
>>>>> ts = execution_date.isoformat and should be in UTC afaik.
>>>>> 
>>>>> something is weird tbh.
>>>>> 
>>>>> B.
>>>>> 
>>>>> 
>>>>>> On 5 Aug 2018, at 21:32, Bolke de Bruin <bdbruin@gmail.com>
wrote:
>>>>>> 
>>>>>> Ash,
>>>>>> 
>>>>>> Reading your proposed changes on your “set-timezone-to-utc” branch
and below analysis, I am not sure what you are perceiving as an issue.
>>>>>> 
>>>>>> For conversion we assume everything is stored in UTC and in a naive
format. Conversion then adds the timezone information. This results in the following
>>>>>> 
>>>>>> postgres timezone = “Europe/Amsterdam”
>>>>>> 
>>>>>> 
>>>>>> airflow=# select * from task_instance;
>>>>>> get_op            | example_http_operator | 2018-07-27 02:00:00+02
>>>>>> 
>>>>>> airflow=# set timezone=UTC;
>>>>>> airflow=# select * from task_instance;
>>>>>> get_op            | example_http_operator | 2018-07-27 00:00:00+00
>>>>>> 
>>>>>> If we don’t set the timezone in the connection postgres assumes
server timezone (in my case “Europe/Amsterdam”). So every datetime Airflow receives will
be in “Europe/Amsterdam” format. However as we defined the model to use UTCDateTime it
will always convert the returned DateTime to UTC.
>>>>>> 
>>>>>> If we have configured Airflow to support something else as UTC as
the default timezone or a DAG has a associated timezone we only convert to that timezone when
calculating the next runtime (not for cron btw). Nowhere else and thus we are UTC everywhere.
>>>>>> 
>>>>>> What do you think is inconsistent?
>>>>>> 
>>>>>> Bolke
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On 5 Aug 2018, at 18:13, Ash Berlin-Taylor <ash_airflowlist@firemirror.com>
wrote:
>>>>>>> 
>>>>>>> Relating to 2): I'm not sure that the upgrade from timezoneless
to timezone aware colums in the task instance is right, or at least it's not what I expected.
>>>>>>> 
>>>>>>> Before weren't all TZs from schedule dates etc in UTC? For the
same task instance (these outputs from psql directly):
>>>>>>> 
>>>>>>> before: execution_date=2017-09-04 00:00:00
>>>>>>> after: execution_date=2017-09-04 01:00:00+01
>>>>>>> 
>>>>>>> **Okay the migration is fine**. It appears that the migration
has done the right thing, but my local DB I'm testing with has a Timezone of GB set, so Postgres
converts it to that TZ on returning an object.
>>>>>>> 
>>>>>>> 3) Do we need to set the TZ of the connection to UTC in SQLAlchemy
to have consistent behaviour? Is this possible some how? I don't know SQLAlchemy that well.
>>>>>>> 
>>>>>>> 
>>>>>>> -ash
>>>>>>> 
>>>>>>>> On 5 Aug 2018, at 16:01, Ash Berlin-Taylor <ash_airflowlist@firemirror.com>
wrote:
>>>>>>>> 
>>>>>>>> 1.) Missing UPDATING note about change of task_log_reader
to now always being "task" (was "s3.task" before.). Logging config is much simpler now though.
This may be particular to my logging config, but given how much of a pain it was to set up
S3 logging in 1.9 I have shared my config with some people in the Gitter chat so It's not
just me.
>>>>>>>> 
>>>>>>>> 2) The path that log-files are written to in S3 has changed
(again - this happened from 1.8 to 1.9). I'd like to avoid having to move all of my log files
again to continue viewing them. The change is that the path now (in 1.10) has a timezone in
it, and the date is in local time, before it was UTC:
>>>>>>>> 
>>>>>>>> before: 2018-07-23T00:00:00/1.log
>>>>>>>> after: 2018-07-23T01:00:00+01:00/1.log
>>>>>>>> 
>>>>>>>> We can possibly get away with an updating note about this
to set a custom log_filename_template. Testing this now.
>>>>>>>> 
>>>>>>>>> On 5 Aug 2018, at 15:00, Ash Berlin-Taylor <ash@firemirror.com>
wrote:
>>>>>>>>> 
>>>>>>>>> -1(binding) from me.
>>>>>>>>> 
>>>>>>>>> Installed with:
>>>>>>>>> 
>>>>>>>>> AIRFLOW_GPL_UNIDECODE=yes pip install 'https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr
<https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz#egg=apache-airflow[emr>,
s3, crypto]>=1.10'
>>>>>>>>> 
>>>>>>>>> Install went fine.
>>>>>>>>> 
>>>>>>>>> Our DAGs that use SparkSubmitOperator are now failing
as there is now a hard dependency on the Kubernetes client libs, but the `emr` group doesn't
mention this.
>>>>>>>>> 
>>>>>>>>> Introduced in https://github.com/apache/incubator-airflow/pull/3112
<https://github.com/apache/incubator-airflow/pull/3112>
>>>>>>>>> 
>>>>>>>>> I see two options for this - either conditionally enable
k8s:// support if the import works, or (less preferred) add kube-client to the emr deps (which
I like less)
>>>>>>>>> 
>>>>>>>>> Sorry - this is the first time I've been able to test
it.
>>>>>>>>> 
>>>>>>>>> I will install this dep manually and continue testing.
>>>>>>>>> 
>>>>>>>>> -ash
>>>>>>>>> 
>>>>>>>>> (Normally no time at home due to new baby, but I got
a standing desk, and a carrier meaning she can sleep on me and I can use my laptop. Win!)
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>>> On 4 Aug 2018, at 22:32, Bolke de Bruin <bdbruin@gmail.com
<mailto:bdbruin@gmail.com>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Bump. 
>>>>>>>>>> 
>>>>>>>>>> Committers please cast your vote. 
>>>>>>>>>> 
>>>>>>>>>> B.
>>>>>>>>>> 
>>>>>>>>>> Sent from my iPhone
>>>>>>>>>> 
>>>>>>>>>>> On 3 Aug 2018, at 13:23, Driesprong, Fokko <fokko@driesprong.frl
<mailto:fokko@driesprong.frl>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> +1 Binding
>>>>>>>>>>> 
>>>>>>>>>>> Installed it using: SLUGIFY_USES_TEXT_UNIDECODE=yes
pip install
>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz
<https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/apache-airflow-1.10.0rc3+incubating-bin.tar.gz>
>>>>>>>>>>> 
>>>>>>>>>>> Cheers, Fokko
>>>>>>>>>>> 
>>>>>>>>>>> 2018-08-03 9:47 GMT+02:00 Bolke de Bruin <bdbruin@gmail.com>:
>>>>>>>>>>> 
>>>>>>>>>>>> Hey all,
>>>>>>>>>>>> 
>>>>>>>>>>>> I have cut Airflow 1.10.0 RC3. This email
is calling a vote on the release,
>>>>>>>>>>>> which will last for 72 hours. Consider this
my (binding) +1.
>>>>>>>>>>>> 
>>>>>>>>>>>> Airflow 1.10.0 RC 3 is available at:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/
<
>>>>>>>>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/1.10.0rc3/>
>>>>>>>>>>>> 
>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-source.tar.gz
is a source release that
>>>>>>>>>>>> comes with INSTALL instructions.
>>>>>>>>>>>> apache-airflow-1.10.0rc3+incubating-bin.tar.gz
is the binary Python
>>>>>>>>>>>> "sdist"
>>>>>>>>>>>> release.
>>>>>>>>>>>> 
>>>>>>>>>>>> Public keys are available at:
>>>>>>>>>>>> 
>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/
<
>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/airflow/>
>>>>>>>>>>>> 
>>>>>>>>>>>> The amount of JIRAs fixed is over 700. Please
have a look at the
>>>>>>>>>>>> changelog.
>>>>>>>>>>>> Since RC2 the following has been fixed:
>>>>>>>>>>>> 
>>>>>>>>>>>> * [AIRFLOW-2817] Force explicit choice on
GPL dependency
>>>>>>>>>>>> * [AIRFLOW-2716] Replace async and await
py3.7 keywords
>>>>>>>>>>>> * [AIRFLOW-2810] Fix typo in Xcom model timestamp
>>>>>>>>>>>> 
>>>>>>>>>>>> Please note that the version number excludes
the `rcX` string as well
>>>>>>>>>>>> as the "+incubating" string, so it's now
simply 1.10.0. This will allow us
>>>>>>>>>>>> to rename the artifact without modifying
the artifact checksums when we
>>>>>>>>>>>> actually release.
>>>>>>>>>>>> 
>>>>>>>>>>>> WARNING: Due to licensing requirements you
will need to set
>>>>>>>>>>>> SLUGIFY_USES_TEXT_UNIDECODE=yes in your environment
when
>>>>>>>>>>>> installing or upgrading. We will try to remove
this requirement for the
>>>>>>>>>>>> next release.
>>>>>>>>>>>> 
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Bolke
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> 
> 


Mime
View raw message