airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Bolke de Bruin <bdbr...@gmail.com>
Subject Re: [VOTE] Release Airflow 1.8.0 based on Airflow 1.8.0rc3
Date Mon, 13 Feb 2017 20:25:51 GMT
A little bit more background on the issue. Mark success sits in views.py as “def success”.
The code should mark a task “successful”, with optional upstream and downstream tasks
as well. Even for tasks in the future (up until datetime.now() ) and past. It was often used
to kick off the first of dag run for when “depends_on_past" was used. As of 1.8.0 this is
not required anymore. The code is complex, lacks testing and more importantly it is outdated:
it creates tasks on its own without dag runs, and is not aware of the “NONE” state. Next
to that it is buggy (upstream/downstream do the same currently ie. only downstream). Hence,
in my opinion it requires refactoring which I am doing at the moment.

Two small fixes could be included in the release, but they don’t solve the root cause.

* https://github.com/apache/incubator-airflow/pull/2075 <https://github.com/apache/incubator-airflow/pull/2075>
* https://github.com/apache/incubator-airflow/pull/2074 <https://github.com/apache/incubator-airflow/pull/2074>

I suggest fixing this in 1.8.1 properly. Chris :) volunteered to do 1.8.1 soon after 1.8.0

Any thoughts?

Bolke

> On 13 Feb 2017, at 20:59, Bolke de Bruin <bdbruin@gmail.com> wrote:
> 
> https://github.com/apache/incubator-airflow/pull/2075 <https://github.com/apache/incubator-airflow/pull/2075>

> 
> Is (part of) the fix. I can include it retroactively if needed, but I don’t consider
it blocking.
> 
> Bolke
> 
> 
>> On 13 Feb 2017, at 20:56, Dan Davydov <dan.davydov@airbnb.com.INVALID <mailto:dan.davydov@airbnb.com.INVALID>>
wrote:
>> 
>> Can you give more details/a repro case Sid? FWIW mark success and clear
>> both work for me.
>> 
>> On Mon, Feb 13, 2017 at 11:51 AM, siddharth anand <sanand@apache.org <mailto:sanand@apache.org>>
wrote:
>> 
>>> Folks!
>>> I need to change my vote.. -1 (Binding).
>>> 
>>> 
>>> Mark Success/Clear is broken in the UI. It's a regression.
>>> 
>>> -s
>>> 
>>> On Mon, Feb 13, 2017 at 10:53 AM, Alex Van Boxel <alex@vanboxel.be <mailto:alex@vanboxel.be>>
wrote:
>>> 
>>>> +1 (binding)
>>>> 
>>>> On Mon, Feb 13, 2017 at 7:45 PM siddharth anand <sanand@apache.org <mailto:sanand@apache.org>>
>>> wrote:
>>>> 
>>>>> +1 (binding)
>>>>> 
>>>>> On Mon, Feb 13, 2017 at 8:57 AM, Chris Riccomini <
>>> criccomini@apache.org <mailto:criccomini@apache.org>>
>>>>> wrote:
>>>>> 
>>>>>> +1 (binding)
>>>>>> 
>>>>>> On Sun, Feb 12, 2017 at 8:54 AM, Jeremiah Lowin <jlowin@apache.org
<mailto:jlowin@apache.org>>
>>>>> wrote:
>>>>>> 
>>>>>>> Interesting -- I also run on Kubernetes with a git-sync sidecar,
>>> but
>>>>> the
>>>>>>> containers wait for the synced repo to apprar before starting
since
>>>> it
>>>>>>> contains some dependencies -- I assume that's why I didn't
>>> experience
>>>>> the
>>>>>>> same issue.
>>>>>>> 
>>>>>>> On Sun, Feb 12, 2017 at 6:29 AM Bolke de Bruin <bdbruin@gmail.com
<mailto:bdbruin@gmail.com>>
>>>>>> wrote:
>>>>>>> 
>>>>>>>> Although the race condition doesn't explain why “num_runs
= None”
>>>>>>> resolved
>>>>>>>> the issue for you earlier, but it does give a clue now: the
PR
>>> that
>>>>>>>> introduced “num_runs = -1” was there to be able to work
with
>>> empty
>>>>> dag
>>>>>>>> dirs, maybe it wasn’t fully covered yet.
>>>>>>>> 
>>>>>>>> Bolke
>>>>>>>> 
>>>>>>>>> On 12 Feb 2017, at 12:26, Bolke de Bruin <bdbruin@gmail.com
<mailto:bdbruin@gmail.com>>
>>>>> wrote:
>>>>>>>>> 
>>>>>>>>> Ok great! Thanks! That sounds like a race condition:
module not
>>>>>>>> available yet at time of reading. I would expect that it
resolves
>>>>>> itself
>>>>>>>> after a while.
>>>>>>>>> 
>>>>>>>>> After talking to some people at the Warsaw BigData conf
I have
>>>> some
>>>>>>>> ideas around syncing dags, Spoiler: no dependency on git.
>>>>>>>>> 
>>>>>>>>> - Bolke
>>>>>>>>> 
>>>>>>>>>> On 12 Feb 2017, at 11:17, Alex Van Boxel <alex@vanboxel.be
<mailto:alex@vanboxel.be>>
>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Running ok, in staging... @bolke I'm running patch-less.
I've
>>>>>> switched
>>>>>>>> my
>>>>>>>>>> Kubernetes from:
>>>>>>>>>> 
>>>>>>>>>> - each container (webserver/scheduler/worker) had
a
>>> git-sync'er
>>>>>>> (getting
>>>>>>>>>> the dags from git)
>>>>>>>>>>> this meant that the scheduler had 0 dags at startup,
and
>>> should
>>>>>> have
>>>>>>>>>> picked them up later
>>>>>>>>>> 
>>>>>>>>>> to
>>>>>>>>>> 
>>>>>>>>>> - single NFS share that shares airflow_home over
each
>>> container
>>>>>>>>>>> the git sync'er is now a seperate container running
before
>>> the
>>>>>> other
>>>>>>>>>> containers
>>>>>>>>>> 
>>>>>>>>>> This resolved my mystery DAG crashes.
>>>>>>>>>> 
>>>>>>>>>> I'll be updating production to a patchless RC3 today,
you get
>>> my
>>>>>> vote
>>>>>>>> after
>>>>>>>>>> that.
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>>> On Sun, Feb 12, 2017 at 4:59 AM Boris Tyukin <
>>>>> boris@boristyukin.com <mailto:boris@boristyukin.com>
>>>>>>> 
>>>>>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> awesome! thanks Jeremiah
>>>>>>>>>>> 
>>>>>>>>>>> On Sat, Feb 11, 2017 at 12:53 PM, Jeremiah Lowin
<
>>>>>> jlowin@apache.org>
>>>>>>>>>>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>>> Boris, I submitted a PR to address your second
point --
>>>>>>>>>>>> https://github.com/apache/incubator-airflow/pull/2068.
>>>> Thanks!
>>>>>>>>>>>> 
>>>>>>>>>>>> On Sat, Feb 11, 2017 at 10:42 AM Boris Tyukin
<
>>>>>>> boris@boristyukin.com <mailto:boris@boristyukin.com>>
>>>>>>>>>>>> wrote:
>>>>>>>>>>>> 
>>>>>>>>>>>>> I am running LocalExecutor and not doing
crazy things but
>>> use
>>>>> DAG
>>>>>>>>>>>>> generation heavily - everything runs
fine as before. As I
>>>>>> mentioned
>>>>>>>> in
>>>>>>>>>>>>> other threads only had a few issues:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 1) had to upgrade MySQL which was a PAIN.
Cloudera CDH is
>>>>> running
>>>>>>> old
>>>>>>>>>>>>> version of MySQL which was compatible
with 1.7.1 but not
>>>>>> compatible
>>>>>>>> now
>>>>>>>>>>>>> with 1.8 because of fractional seconds
support PR.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 2) when you install airflow, there are
two new example DAGs
>>>>>>>>>>>>> (last_task_only) which are going back
very far in the past
>>>> and
>>>>>>>>>>> scheduled
>>>>>>>>>>>> to
>>>>>>>>>>>>> run every hour - a bunch of dags triggered
on the first
>>> start
>>>>> of
>>>>>>>>>>>> scheduler
>>>>>>>>>>>>> and hosed my CPU
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Everything else was fine and I LOVE lots
of small UI
>>> changes,
>>>>>> which
>>>>>>>>>>>> reduced
>>>>>>>>>>>>> a lot my use of cli.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks again for the amazing work and
an awesome project!
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>> On Sat, Feb 11, 2017 at 9:17 AM, Jeremiah
Lowin <
>>>>>> jlowin@apache.org <mailto:jlowin@apache.org>
>>>>>>>> 
>>>>>>>>>>>> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I was able to deploy successfully.
+1 (binding)
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Fri, Feb 10, 2017 at 7:37 PM Maxime
Beauchemin <
>>>>>>>>>>>>>> maximebeauchemin@gmail.com> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Fri, Feb 10, 2017 at 3:44
PM, Arthur Wiedmer <
>>>>>>>>>>>>>> arthur.wiedmer@gmail.com>
>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> On Feb 10, 2017 3:13 PM,
"Dan Davydov" <
>>>>>> dan.davydov@airbnb.com.
>>>>>>>>>>>>>> invalid>
>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> Our staging looks good,
all the DAGs there pass.
>>>>>>>>>>>>>>>>> +1 (binding)
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> On Fri, Feb 10, 2017
at 10:21 AM, Chris Riccomini <
>>>>>>>>>>>>>>> criccomini@apache.org
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> Running in all environments.
Will vote after the
>>> weekend
>>>>> to
>>>>>>>>>>>> make
>>>>>>>>>>>>>> sure
>>>>>>>>>>>>>>>>>> things are working
properly, but so far so good.
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>> On Fri, Feb 10, 2017
at 6:05 AM, Bolke de Bruin <
>>>>>>>>>>>>> bdbruin@gmail.com <mailto:bdbruin@gmail.com>
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Dear All,
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Let’s try again!
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I have made the
THIRD RELEASE CANDIDATE of Airflow
>>>> 1.8.0
>>>>>>>>>>>>>> available
>>>>>>>>>>>>>>>> at:
>>>>>>>>>>>>>>>>>>> 
>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/ <https://dist.apache.org/repos/dist/dev/incubator/airflow/>
>>>>>>>>>>> <
>>>>>>>>>>>>>>>>>>> 
>>>>> https://dist.apache.org/repos/dist/dev/incubator/airflow/>
>>>>>>>>>>> ,
>>>>>>>>>>>>>>> public
>>>>>>>>>>>>>>>>> keys
>>>>>>>>>>>>>>>>>>> are available
at https://dist.apache.org/repos/
>>>>>>>>>>>>>>>> dist/release/incubator/
>>>>>>>>>>>>>>>>>>> airflow/ <
>>>>>>>>>>>>> https://dist.apache.org/repos/dist/release/incubator/
>>>>>>>>>>>>>>>>> airflow/>
>>>>>>>>>>>>>>>>>>> . It is tagged
with a local version
>>> “apache.incubating”
>>>>> so
>>>>>>>>>>> it
>>>>>>>>>>>>>>> allows
>>>>>>>>>>>>>>>>>>> upgrading from
earlier releases.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Two issues have
been fixed since release candidate 2:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> * trigger_dag
could create dags with fractional
>>>> seconds,
>>>>>>>>>>> not
>>>>>>>>>>>>>>>> supported
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>>>> logging and UI
at the moment
>>>>>>>>>>>>>>>>>>> * local api client
trigger_dag had hardcoded
>>> execution
>>>> of
>>>>>>>>>>>> None
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Known issue:
>>>>>>>>>>>>>>>>>>> * Airflow on
kubernetes and num_runs -1 (default) can
>>>>>>>>>>> expose
>>>>>>>>>>>>>> import
>>>>>>>>>>>>>>>>>> issues.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> I have extensively
discussed this with Alex
>>> (reporter)
>>>>> and
>>>>>>>>>>> we
>>>>>>>>>>>>>>>> consider
>>>>>>>>>>>>>>>>>>> this a known
issue with a workaround available as we
>>>> are
>>>>>>>>>>>> unable
>>>>>>>>>>>>>> to
>>>>>>>>>>>>>>>>>>> replicate this
in a different environment.
>>> UPDATING.md
>>>>> has
>>>>>>>>>>>> been
>>>>>>>>>>>>>>>> updated
>>>>>>>>>>>>>>>>>>> with the work
around.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> As these issues
are confined to a very specific area
>>>> and
>>>>>>>>>>> full
>>>>>>>>>>>>>> unit
>>>>>>>>>>>>>>>>> tests
>>>>>>>>>>>>>>>>>>> were added I
would also like to raise a VOTE for
>>>>> releasing
>>>>>>>>>>>>> 1.8.0
>>>>>>>>>>>>>>>> based
>>>>>>>>>>>>>>>>> on
>>>>>>>>>>>>>>>>>>> release candidate
3, i.e. just renaming release
>>>>> candidate 3
>>>>>>>>>>>> to
>>>>>>>>>>>>>>> 1.8.0
>>>>>>>>>>>>>>>>>>> release.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Please respond
to this email by:
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> +1,0,-1 with
*binding* if you are a PMC member or
>>>>>>>>>>>> *non-binding*
>>>>>>>>>>>>>> if
>>>>>>>>>>>>>>>> you
>>>>>>>>>>>>>>>>>> are
>>>>>>>>>>>>>>>>>>> not.
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> Thanks!
>>>>>>>>>>>>>>>>>>> Bolke
>>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>>>> My VOTE: +1 (binding)
>>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> --
>>>>>>>>>> _/
>>>>>>>>>> _/ Alex Van Boxel
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> --
>>>>  _/
>>>> _/ Alex Van Boxel
>>>> 
>>> 
> 


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message