Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6E266200BF2 for ; Sun, 18 Dec 2016 21:04:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 6CB55160AF6; Sun, 18 Dec 2016 20:04:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B90DA160B30 for ; Sun, 18 Dec 2016 21:04:03 +0100 (CET) Received: (qmail 63059 invoked by uid 500); 18 Dec 2016 20:04:02 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 63050 invoked by uid 99); 18 Dec 2016 20:04:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Dec 2016 20:04:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 6ECD81A01BF for ; Sun, 18 Dec 2016 20:04:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -6.219 X-Spam-Level: X-Spam-Status: No, score=-6.219 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id C54-zNo_TmlV for ; Sun, 18 Dec 2016 20:04:01 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 7B2DA5FBFE for ; Sun, 18 Dec 2016 20:04:00 +0000 (UTC) Received: (qmail 62893 invoked by uid 99); 18 Dec 2016 20:03:59 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 18 Dec 2016 20:03:59 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 82A612C03E1 for ; Sun, 18 Dec 2016 20:03:59 +0000 (UTC) Date: Sun, 18 Dec 2016 20:03:59 +0000 (UTC) From: "Bolke de Bruin (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (AIRFLOW-695) Retries do not execute because dagrun is in FAILED state MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 18 Dec 2016 20:04:04 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-695?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15759359#comment-15759359 ] Bolke de Bruin edited comment on AIRFLOW-695 at 12/18/16 8:03 PM: ------------------------------------------------------------------ Ok I think I figured out the issue. The scheduler checks the tasks instances without taking into account if the executor already reported back. In this case the executor reports back several iterations later. Due to the fact tasks will not enter the queue when the task is considered running, the task state will be "queued" indefinitely in limbo between the scheduler and the executor. was (Author: bolke): Ok I think I figure out the issue. The scheduler checks the tasks instances without taking into account if the executor already reported back. In this case the executor reports back several iterations later. Due to the fact tasks will not enter the queue when the task is considered running, the task state will be "queued" indefinitely in limbo between the scheduler and the executor. > Retries do not execute because dagrun is in FAILED state > -------------------------------------------------------- > > Key: AIRFLOW-695 > URL: https://issues.apache.org/jira/browse/AIRFLOW-695 > Project: Apache Airflow > Issue Type: Bug > Components: DagRun > Reporter: Harvey Xia > Priority: Blocker > Labels: executor, scheduler > > Currently on the latest master commit (15ff540ecd5e60e7ce080177ea3ea227582a4672), running on the LocalExecutor, retries on tasks do not execute because the state of the corresponding dagrun changes to FAILED. The task instance then gets blocked because "Task instance's dagrun was not in the 'running' state but in the state 'failed'," the error message produced by the following lines: https://github.com/apache/incubator-airflow/blob/master/airflow/ti_deps/deps/dagrun_exists_dep.py#L48-L50 > This error can be reproduced with the following simple DAG: > {code:title=DAG.py|borderStyle=solid} > dag = models.DAG(dag_id='test_retry_handling') > task = BashOperator( > task_id='test_retry_handling_op', > bash_command='exit 1', > retries=1, > retry_delay=datetime.timedelta(minutes=1), > dag=dag, > owner='airflow', > start_date=datetime.datetime(2016, 2, 1, 0, 0, 0)) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)