Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id EEA2E200B36 for ; Wed, 22 Jun 2016 01:53:43 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id ED38F160A60; Tue, 21 Jun 2016 23:53:43 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 17F42160A4F for ; Wed, 22 Jun 2016 01:53:42 +0200 (CEST) Received: (qmail 92921 invoked by uid 500); 21 Jun 2016 23:53:42 -0000 Mailing-List: contact dev-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list dev@airflow.incubator.apache.org Received: (qmail 92890 invoked by uid 99); 21 Jun 2016 23:53:42 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 21 Jun 2016 23:53:42 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 93F73C0057 for ; Tue, 21 Jun 2016 23:53:41 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: 1.298 X-Spam-Level: * X-Spam-Status: No, score=1.298 tagged_above=-999 required=6.31 tests=[DKIM_SIGNED=0.1, DKIM_VALID=-0.1, HTML_MESSAGE=2, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_PASS=-0.001] autolearn=disabled Authentication-Results: spamd4-us-west.apache.org (amavisd-new); dkim=pass (2048-bit key) header.d=apigee-com.20150623.gappssmtp.com Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id OfIqfpd0ie3e for ; Tue, 21 Jun 2016 23:53:38 +0000 (UTC) Received: from mail-vk0-f44.google.com (mail-vk0-f44.google.com [209.85.213.44]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTPS id 2062F5FB60 for ; Tue, 21 Jun 2016 23:53:38 +0000 (UTC) Received: by mail-vk0-f44.google.com with SMTP id u64so41485952vkf.3 for ; Tue, 21 Jun 2016 16:53:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=apigee-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=x4oEkXBInnflVWmN59N88/El+H+5LpIgi6WzInbFO+Y=; b=UBplN0GLirhwhlcedNhE704fur/f0xzeLj0MCUV9Fj7DTtzEWi4LuYr2JfBK9fON1E AOTF1+OCymU0ouZ+tMSQojyv5gsTc7FA3Z6u8huWy/ygk2c465i51SMs6HELyrCDpjcn P0MOOl3kHusi9+c/4m+gSLkkB9KPXoYufEFra/0m+dwk1Xk3ndVlvT24ULnJBNYnEvxt 68A2EKupKfMaXn0BsxGtitXEeTMlrqCebjXIGCdHUT5p9e7qLBz9ii0XCyOZ35W1/b1q lKDYGBgIo4bJVqTxIU7EZ2ioX9gN6oIu5J5ab0Q+mxhXk5hY0zIeFyzSTa5fAj/O5GMe fy8A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=x4oEkXBInnflVWmN59N88/El+H+5LpIgi6WzInbFO+Y=; b=X5MJ5q1y1KZk/PbnCbt+erLrur8+0GQGF3JUN3ekxR8ke5f8SOFp2cWrEChJHUYo8q Ehqja+fMqieVhKG7wrKxldo/RZVolIxTtk95Z9qyjxa14VEfzkfyHLgdH+1X4PZ8F6Vj aAxntc5xsOO59EGnI97rh7xbfwIwjxQfezz1f++uDdaXYdZOibG5Vjn4/WUhj+wekMXM uhbG1ab+Bb4UJ+tY1tplfIpYMgHRI5Y9lzbSKdAapdWLengItGWZqVjlbZwrpw1ynjRF FmRLX3rMIrTgQAzrqfePGixVoL/lo9TaHYKM483TOYKlUfEvuf3pV1FGnvdD0YI41GT6 rEWQ== X-Gm-Message-State: ALyK8tKZAvMFba0wO67bCAH6njM/45bWrC5KHJlgSd6QTCMcB1RcKDeCW1GTkC/wxmcl1OYnF8f/yupFVBQ1PRrC X-Received: by 10.31.207.2 with SMTP id f2mr8233987vkg.110.1466553210597; Tue, 21 Jun 2016 16:53:30 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Ben Tallman Date: Tue, 21 Jun 2016 23:53:21 +0000 Message-ID: Subject: Re: dagrun_timeout not honoured? To: dev@airflow.incubator.apache.org Content-Type: multipart/alternative; boundary=001a114e522e2985ea0535d28709 archived-at: Tue, 21 Jun 2016 23:53:44 -0000 --001a114e522e2985ea0535d28709 Content-Type: text/plain; charset=UTF-8 Maxime - Wish I did have time... BUT, I can say that SLA timeouts will fail (and error) on any dag with schedule=None because they depend on the Now() being less than the next scheduled date (None)... dttm = dag.following_schedule(dttm) while dttm < datetime.now(): I will file a bug... On Tue, Jun 21, 2016 at 11:26 AM Maxime Beauchemin < maximebeauchemin@gmail.com> wrote: > A tangent here: for people who have the knowledge (and a bit of time on > their hand), providing a failing unit test can help the core committers > with an easy way to jump in to help. > > I always wonder whether I'll be able reproduce the bug, is it version > specific? environment specific? is it based on bad assumptions? > > With a failing unit test it's really clear what the expectations are and it > makes it really easy for people can just jump in and fix it. > > Thanks, > > Max > > On Tue, Jun 21, 2016 at 9:15 AM, Ben Tallman wrote: > > > We have seen this too. Running 1.7.0 with Celery, neither DAG timeout nor > > individual task sla's seem to be honored. In truth, we haven't done a lot > > of testing, as it is more important that we get our overall ETL migrated > > with workarounds. > > > > However, we will be digging in at some point for greater clarity... > > > > On Thu, Jun 16, 2016 at 11:21 AM harish singh > > wrote: > > > > > Hi guys, > > > > > > Since we have "dag_conurrency" restriction, I tried to play with > > > dagrun_timeout. > > > So that after some interval, dag runs are marked failed and pipeline > > > progresses. > > > But this is not happening. > > > > > > I have this dag (@hourly): > > > > > > A -> B -> C -> D -> E > > > > > > C: depends_on_past=true > > > > > > My dagrun_timeout is 60 minutes > > > > > > default_args = { > > > 'owner': 'airflow', > > > 'depends_on_past': False, > > > 'start_date': scheduling_start_date, > > > 'email': ['airflow@airflow.com'], > > > 'email_on_failure': False, > > > 'email_on_retry': False, > > > 'retries': 2, > > > 'retry_delay': default_retries_delay, > > > 'dagrun_timeout':datetime.timedelta(minutes=60) > > > } > > > > > > > > > Parallelism setting in airflow.cfg: > > > > > > parallelism = 8 > > > dag_concurrency = 8 > > > max_active_runs_per_dag = 8 > > > > > > > > > For hour 1, all the tasks got completed. > > > Now in hour 2, say task C failed. > > > > > > From hour 3 onwards, Tasks A and B keep running. > > > Task C never triggers because it depends on past (and past hour failed) > > > > > > Since dag conurrency is 8, my pipeline progresses from hour 3 to hour > 10 > > > (thats next 8 hours) for Tasks A and B. After this, pipeline stalls. > > > > > > "dagrun_timeout" was 60 minutes. This should mean that after 60 > minutes, > > > from hour 3 onwards, the DAG runs that has been up for more than 60 > > minutes > > > should be marked FAILED and the pipeline should progress? > > > > > > But this is not happening. So I am guessing my understanding here is > not > > > correct. > > > > > > What should be behavior when we use "dagrun_timeout" ? > > > Also, how can I make sure that the dag proceeds in this situation? > > > > > > In the example I gave above, Task A and B should keep running every > hour > > > (since it doesnt depend on past). > > > Why it runs 8(dag_conurrency) instances and stalls? > > > > > > > > > Thanks, > > > Harish > > > > > > --001a114e522e2985ea0535d28709--