airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From harish singh <harish.sing...@gmail.com>
Subject depends_on_past not working as expected?
Date Fri, 13 May 2016 20:19:48 GMT
Hi guys,

I am having an issue with making 'depends_on_past=true' work

This my pipeline:

a -> b -> c -> d -> e

a -> x -> e

a -> y -> e

I have default args for all Tasks:

scheduling_start_date = (datetime.utcnow() -
datetime.timedelta(hours=1)).replace(minute=0, second=0,
microsecond=0)

default_args = {
    'owner': 'airflow',
    'depends_on_past': False,
    'start_date': scheduling_start_date,
    'email': ['airflow@airflow.com'],
    'email_on_failure': False,
    'email_on_retry': False,
    'retries': 2,
    'retry_delay': default_retries_delay,
    # 'queue': 'bash_queue',
    # 'pool': 'backfill',
    # 'priority_weight': 10,
    # 'end_date': datetime(2016, 1, 1),}


But specifically for tasks d, x, y , I have depends_on_past = true

 depends_on_past=True


So now:
For the first hour, d, x and y failed.
So I am assuming in the next hour these jobs should not be even tried?
right ?
But I see in the next hour and subsequent hours,  these tasks are getting
triggered (and failing) ...
Should the behavior be : that if a tasks previous execution failed, no
attempt is made during the next run of dag?
Or am I doing something very "bad" here?


Thanks,
Harish

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message