From commits-return-13029-archive-asf-public=cust-asf.ponee.io@airflow.incubator.apache.org Tue Mar 6 00:17:03 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 24476180608 for ; Tue, 6 Mar 2018 00:17:02 +0100 (CET) Received: (qmail 16850 invoked by uid 500); 5 Mar 2018 23:17:02 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 16841 invoked by uid 99); 5 Mar 2018 23:17:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2018 23:17:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A2E5FC0496 for ; Mon, 5 Mar 2018 23:17:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -110.311 X-Spam-Level: X-Spam-Status: No, score=-110.311 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id viltnUqDp7AX for ; Mon, 5 Mar 2018 23:17:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BCFF65F23D for ; Mon, 5 Mar 2018 23:17:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5EC26E00A7 for ; Mon, 5 Mar 2018 23:17:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1E1002126D for ; Mon, 5 Mar 2018 23:17:00 +0000 (UTC) Date: Mon, 5 Mar 2018 23:17:00 +0000 (UTC) From: "Ash Berlin-Taylor (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Closed] (AIRFLOW-160) Parse DAG files through child processes MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/AIRFLOW-160?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ash Berlin-Taylor closed AIRFLOW-160. ------------------------------------- Resolution: Fixed Fix Version/s: Airflow 1.8 Fixed by https://github.com/apache/incubator-airflow/commit/fdb7e949140b735b8554ae5b22ad752e86f6ebaf > Parse DAG files through child processes > --------------------------------------- > > Key: AIRFLOW-160 > URL: https://issues.apache.org/jira/browse/AIRFLOW-160 > Project: Apache Airflow > Issue Type: Improvement > Components: scheduler > Reporter: Paul Yang > Assignee: Paul Yang > Priority: Major > Fix For: Airflow 1.8 > > > Currently, the Airflow scheduler parses all user DAG files in the same process as the scheduler itself. We've seen issues in production where bad DAG files cause scheduler to fail. A simple example is if the user script calls `sys.exit(1)`, the scheduler will exit as well. We've also seen an unusual case where modules loaded by the user DAG affect operation of the scheduler. For better uptime, the scheduler should be resistant to these problematic user DAGs. > The proposed solution is to parse and schedule user DAGs through child processes. This way, the main scheduler process is more isolated from bad DAGs. There's a side benefit as well - since parsing is distributed among multiple processes, it's possible to parse the DAG files more frequently, reducing the latency between when a DAG is modified and when the changes are picked up. > Another issue right now is that all DAGs must be scheduled before any tasks are sent to the executor. This means that the frequency of task scheduling is limited by the slowest DAG to schedule. The changes needed for scheduling DAGs through child processes will also make it easy to decouple this process and allow tasks to be scheduled and sent to the executor in a more independent fashion. This way, overall scheduling won't be held back by a slow DAG. -- This message was sent by Atlassian JIRA (v7.6.3#76005)