airflow-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ben Tallman <...@apigee.com>
Subject Re: dagrun_timeout not honoured?
Date Wed, 22 Jun 2016 00:01:24 GMT
Already there, twice, but one with a pull request...

https://github.com/apache/incubator-airflow/pull/1601

On Tue, Jun 21, 2016 at 4:53 PM Ben Tallman <ben@apigee.com> wrote:

> Maxime -
>
> Wish I did have time... BUT, I can say that SLA timeouts will fail (and
> error) on any dag with schedule=None because they depend on the Now() being
> less than the next scheduled date (None)...
>
> dttm = dag.following_schedule(dttm)
> while dttm < datetime.now():
>
> I will file a bug...
>
>
> On Tue, Jun 21, 2016 at 11:26 AM Maxime Beauchemin <
> maximebeauchemin@gmail.com> wrote:
>
>> A tangent here: for people who have the knowledge (and a bit of time on
>> their hand), providing a failing unit test can help the core committers
>> with an easy way to jump in to help.
>>
>> I always wonder whether I'll be able reproduce the bug, is it version
>> specific? environment specific? is it based on bad assumptions?
>>
>> With a failing unit test it's really clear what the expectations are and
>> it
>> makes it really easy for people can just jump in and fix it.
>>
>> Thanks,
>>
>> Max
>>
>> On Tue, Jun 21, 2016 at 9:15 AM, Ben Tallman <ben@apigee.com> wrote:
>>
>> > We have seen this too. Running 1.7.0 with Celery, neither DAG timeout
>> nor
>> > individual task sla's seem to be honored. In truth, we haven't done a
>> lot
>> > of testing, as it is more important that we get our overall ETL migrated
>> > with workarounds.
>> >
>> > However, we will be digging in at some point for greater clarity...
>> >
>> > On Thu, Jun 16, 2016 at 11:21 AM harish singh <harish.singh22@gmail.com
>> >
>> > wrote:
>> >
>> > > Hi guys,
>> > >
>> > > Since we have "dag_conurrency" restriction, I tried to play with
>> > >  dagrun_timeout.
>> > > So that after some interval, dag runs are marked failed and pipeline
>> > > progresses.
>> > > But this is not happening.
>> > >
>> > > I have this dag (@hourly):
>> > >
>> > > A -> B -> C  -> D -> E
>> > >
>> > > C: depends_on_past=true
>> > >
>> > > My dagrun_timeout is 60 minutes
>> > >
>> > > default_args = {
>> > >     'owner': 'airflow',
>> > >     'depends_on_past': False,
>> > >     'start_date': scheduling_start_date,
>> > >     'email': ['airflow@airflow.com'],
>> > >     'email_on_failure': False,
>> > >     'email_on_retry': False,
>> > >     'retries': 2,
>> > >     'retry_delay': default_retries_delay,
>> > >     'dagrun_timeout':datetime.timedelta(minutes=60)
>> > > }
>> > >
>> > >
>> > > Parallelism setting in airflow.cfg:
>> > >
>> > > parallelism = 8
>> > > dag_concurrency = 8
>> > > max_active_runs_per_dag = 8
>> > >
>> > >
>> > > For hour 1, all the tasks got completed.
>> > > Now in hour 2,  say task C failed.
>> > >
>> > > From hour 3 onwards, Tasks A and B keep running.
>> > > Task C never triggers because it depends on past (and past hour
>> failed)
>> > >
>> > > Since dag conurrency is 8, my pipeline progresses from hour 3 to hour
>> 10
>> > > (thats next 8 hours) for Tasks A and B. After this, pipeline stalls.
>> > >
>> > > "dagrun_timeout" was 60 minutes.  This should mean that after 60
>> minutes,
>> > > from hour 3 onwards, the DAG runs that has been up for more than 60
>> > minutes
>> > > should be marked FAILED  and the pipeline should progress?
>> > >
>> > > But this is not happening. So I am guessing my understanding here is
>> not
>> > > correct.
>> > >
>> > > What should be behavior when we use  "dagrun_timeout" ?
>> > > Also, how can I make sure that the dag proceeds in this situation?
>> > >
>> > > In the example I gave above, Task A and B should keep running every
>> hour
>> > > (since it doesnt depend on past).
>> > > Why it runs 8(dag_conurrency) instances  and stalls?
>> > >
>> > >
>> > > Thanks,
>> > > Harish
>> > >
>> >
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message