Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 311A2200B9F for ; Mon, 26 Sep 2016 23:31:27 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 30479160AC8; Mon, 26 Sep 2016 21:31:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 790D3160ACA for ; Mon, 26 Sep 2016 23:31:26 +0200 (CEST) Received: (qmail 70653 invoked by uid 500); 26 Sep 2016 21:31:25 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 70643 invoked by uid 99); 26 Sep 2016 21:31:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2016 21:31:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id 1C403C0405 for ; Mon, 26 Sep 2016 21:31:25 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id eqXuojnj_8SY for ; Mon, 26 Sep 2016 21:31:23 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 4B4A45FB6C for ; Mon, 26 Sep 2016 21:31:22 +0000 (UTC) Received: (qmail 70095 invoked by uid 99); 26 Sep 2016 21:31:21 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Sep 2016 21:31:21 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 608582C044E for ; Mon, 26 Sep 2016 21:31:21 +0000 (UTC) Date: Mon, 26 Sep 2016 21:31:21 +0000 (UTC) From: "Siddharth Anand (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (AIRFLOW-462) Concurrent Scheduler Jobs pushing the same task to queue MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Mon, 26 Sep 2016 21:31:27 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15524228#comment-15524228 ] Siddharth Anand commented on AIRFLOW-462: ----------------------------------------- H! A few caveats - we ended up turning off dual schedulers because of another issue -- i.e. how we currently log. We also use LocalExecutor, not CeleryExecutors and have never tested with CeleryExecutor. The idea, if using CeleryExecutor, is not to run more than one scheduler. The reason I opted for dual schedulers was to run LocalExecutor on more machines. However, as changes are being made to the scheduler, I don't believe running dual schedulers will be maintained going forward. It's not a first-level feature that most people require. -s > Concurrent Scheduler Jobs pushing the same task to queue > -------------------------------------------------------- > > Key: AIRFLOW-462 > URL: https://issues.apache.org/jira/browse/AIRFLOW-462 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler > Affects Versions: Airflow 1.7.0 > Reporter: Yogesh > Priority: Blocker > > Hi, > We are using airflow version 1.7.0 and we tried to implement high availability for airflow daemons in our production environment. > Detailed high availability approach: > - Airflow running on two different machines with all the daemons(webserver, scheduler, execueor) > - Single mysql db repository pointed by two schedulers > - Replicated dag files in both the machines > - Running Single Rabbitmq Instance as message broker > While doing so we came across below problem: > - A particular task was sent to executor twice (two entries in message queue) by two different schedulers. But, we see only single entry for the task instance in database which is correct. > We just checked out the code and found below fact: > - before sending the task to executor it checks for task state in database and if its not already QUEUED it pushes that task to queue > issue: > As there is no locking implemented on the task instance in the database and both the Scheduler jobs are running so close that the second one might check for the status in the db before the first one updates that to QUEUED. > We are not sure if in recent release this issue have been taken care of. > Would you please help with some appropriate approach so that the high availability can be achieved. > Thanks > Yogesh -- This message was sent by Atlassian JIRA (v6.3.4#6332)