airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From harish singh <harish.sing...@gmail.com>
Subject Re: depends_on_past not working as expected?
Date Fri, 13 May 2016 21:06:08 GMT
we are seeing this in production. I wont be able to update the version
right now. But I will try to test this out over the weekend.
But if I consider 1.7.0, am I doing something incorrect? or did something
change in .1.rc6?

One thing I forgot to mention was that - we do run a backfill before we
turn on the DAG.
So if I have to turn the DAG on right now, I will first run a backfill for
last 24 hours and then I turn it on (from the UI) so that it gets scheduled
by the scheduler.

Nevertheless, I am going to try this scenario on 1.7.1.rc6.

Thanks!


On Fri, May 13, 2016 at 1:54 PM, Bolke de Bruin <bdbruin@gmail.com> wrote:

>
> > Op 13 mei 2016, om 22:51 heeft harish singh <harish.singh22@gmail.com>
> het volgende geschreven:
> >
> > Bolke, its 1.7.0
> >
> >
> > On Fri, May 13, 2016 at 1:35 PM, Bolke de Bruin <bdbruin@gmail.com>
> wrote:
> >
> >>
> >>> Op 13 mei 2016, om 22:19 heeft harish singh <harish.singh22@gmail.com>
> >> het volgende geschreven:
> >>>
> >>> Hi guys,
> >>>
> >>> I am having an issue with making 'depends_on_past=true' work
> >>>
> >>> This my pipeline:
> >>>
> >>> a -> b -> c -> d -> e
> >>>
> >>> a -> x -> e
> >>>
> >>> a -> y -> e
> >>>
> >>> I have default args for all Tasks:
> >>>
> >>> scheduling_start_date = (datetime.utcnow() -
> >>> datetime.timedelta(hours=1)).replace(minute=0, second=0,
> >>> microsecond=0)
> >>>
> >>> default_args = {
> >>>   'owner': 'airflow',
> >>>   'depends_on_past': False,
> >>>   'start_date': scheduling_start_date,
> >>>   'email': ['airflow@airflow.com'],
> >>>   'email_on_failure': False,
> >>>   'email_on_retry': False,
> >>>   'retries': 2,
> >>>   'retry_delay': default_retries_delay,
> >>>   # 'queue': 'bash_queue',
> >>>   # 'pool': 'backfill',
> >>>   # 'priority_weight': 10,
> >>>   # 'end_date': datetime(2016, 1, 1),}
> >>>
> >>>
> >>> But specifically for tasks d, x, y , I have depends_on_past = true
> >>>
> >>> depends_on_past=True
> >>>
> >>>
> >>> So now:
> >>> For the first hour, d, x and y failed.
> >>> So I am assuming in the next hour these jobs should not be even tried?
> >>> right ?
> >>> But I see in the next hour and subsequent hours,  these tasks are
> getting
> >>> triggered (and failing) ...
> >>> Should the behavior be : that if a tasks previous execution failed, no
> >>> attempt is made during the next run of dag?
> >>> Or am I doing something very "bad" here?
> >>
> >>
> >> What version are you on Harish?
> >>
> >>
>
> Can you try 1.7.1.rc6 before w dive in?
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message