airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Maxime Beauchemin <maximebeauche...@gmail.com>
Subject Re: dagrun_timeout not honoured?
Date Tue, 21 Jun 2016 18:26:49 GMT
A tangent here: for people who have the knowledge (and a bit of time on
their hand), providing a failing unit test can help the core committers
with an easy way to jump in to help.

I always wonder whether I'll be able reproduce the bug, is it version
specific? environment specific? is it based on bad assumptions?

With a failing unit test it's really clear what the expectations are and it
makes it really easy for people can just jump in and fix it.

Thanks,

Max

On Tue, Jun 21, 2016 at 9:15 AM, Ben Tallman <ben@apigee.com> wrote:

> We have seen this too. Running 1.7.0 with Celery, neither DAG timeout nor
> individual task sla's seem to be honored. In truth, we haven't done a lot
> of testing, as it is more important that we get our overall ETL migrated
> with workarounds.
>
> However, we will be digging in at some point for greater clarity...
>
> On Thu, Jun 16, 2016 at 11:21 AM harish singh <harish.singh22@gmail.com>
> wrote:
>
> > Hi guys,
> >
> > Since we have "dag_conurrency" restriction, I tried to play with
> >  dagrun_timeout.
> > So that after some interval, dag runs are marked failed and pipeline
> > progresses.
> > But this is not happening.
> >
> > I have this dag (@hourly):
> >
> > A -> B -> C  -> D -> E
> >
> > C: depends_on_past=true
> >
> > My dagrun_timeout is 60 minutes
> >
> > default_args = {
> >     'owner': 'airflow',
> >     'depends_on_past': False,
> >     'start_date': scheduling_start_date,
> >     'email': ['airflow@airflow.com'],
> >     'email_on_failure': False,
> >     'email_on_retry': False,
> >     'retries': 2,
> >     'retry_delay': default_retries_delay,
> >     'dagrun_timeout':datetime.timedelta(minutes=60)
> > }
> >
> >
> > Parallelism setting in airflow.cfg:
> >
> > parallelism = 8
> > dag_concurrency = 8
> > max_active_runs_per_dag = 8
> >
> >
> > For hour 1, all the tasks got completed.
> > Now in hour 2,  say task C failed.
> >
> > From hour 3 onwards, Tasks A and B keep running.
> > Task C never triggers because it depends on past (and past hour failed)
> >
> > Since dag conurrency is 8, my pipeline progresses from hour 3 to hour 10
> > (thats next 8 hours) for Tasks A and B. After this, pipeline stalls.
> >
> > "dagrun_timeout" was 60 minutes.  This should mean that after 60 minutes,
> > from hour 3 onwards, the DAG runs that has been up for more than 60
> minutes
> > should be marked FAILED  and the pipeline should progress?
> >
> > But this is not happening. So I am guessing my understanding here is not
> > correct.
> >
> > What should be behavior when we use  "dagrun_timeout" ?
> > Also, how can I make sure that the dag proceeds in this situation?
> >
> > In the example I gave above, Task A and B should keep running every hour
> > (since it doesnt depend on past).
> > Why it runs 8(dag_conurrency) instances  and stalls?
> >
> >
> > Thanks,
> > Harish
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message