Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 51497200C4B for ; Sun, 5 Mar 2017 14:08:37 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 500E3160B7D; Sun, 5 Mar 2017 13:08:37 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 97FBF160B6B for ; Sun, 5 Mar 2017 14:08:36 +0100 (CET) Received: (qmail 13136 invoked by uid 500); 5 Mar 2017 13:08:35 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 13127 invoked by uid 99); 5 Mar 2017 13:08:35 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 05 Mar 2017 13:08:35 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 71B0518611E for ; Sun, 5 Mar 2017 13:08:35 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -1.547 X-Spam-Level: X-Spam-Status: No, score=-1.547 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-2.999, SPF_NEUTRAL=0.652] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id Wp36b6NwiXoG for ; Sun, 5 Mar 2017 13:08:34 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id 71EE65F624 for ; Sun, 5 Mar 2017 13:08:34 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id A417AE09D6 for ; Sun, 5 Mar 2017 13:08:33 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 008072417A for ; Sun, 5 Mar 2017 13:08:33 +0000 (UTC) Date: Sun, 5 Mar 2017 13:08:33 +0000 (UTC) From: "Bolke de Bruin (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AIRFLOW-931) LocalExecutor fails to run queued task with race condition MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 05 Mar 2017 13:08:37 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15896260#comment-15896260 ] Bolke de Bruin commented on AIRFLOW-931: ---------------------------------------- [~v_krishna] please see https://github.com/apache/incubator-airflow/pull/2127 and test it. > LocalExecutor fails to run queued task with race condition > ---------------------------------------------------------- > > Key: AIRFLOW-931 > URL: https://issues.apache.org/jira/browse/AIRFLOW-931 > Project: Apache Airflow > Issue Type: Sub-task > Affects Versions: Airflow 1.8, 1.8.0rc4 > Reporter: Vijay Krishna Ramesh > Assignee: Bolke de Bruin > > https://gist.github.com/vijaykramesh/707262c83429ab2a3f5ee701879813e3 provides a small example that consistently hits this problem with LocalExecutor. > Basically when the dag run kicks off (with concurrency > 1) and a LocalExecutor with parallelism > 2 the scheduler marks more than concurrency tasks as queued (https://github.com/apache/incubator-airflow/blob/master/airflow/jobs.py#L1095) > There is a second check before actually running the task (https://github.com/apache/incubator-airflow/blob/master/airflow/models.py#L1291) that leaves the task in the QUEUED state but then the scheduler never picks it back up. This causes the DAG to get stuck (as the queued tasks never run) until the scheduler is restarted (at which point the enqueued tasks are considered orphaned, the status is set to NONE, and then they are picked up by the scheduler again and run. -- This message was sent by Atlassian JIRA (v6.3.15#6346)