Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id F2DBF200B61 for ; Tue, 9 Aug 2016 09:05:24 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id F1901160AA5; Tue, 9 Aug 2016 07:05:24 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 415C6160AA4 for ; Tue, 9 Aug 2016 09:05:24 +0200 (CEST) Received: (qmail 29772 invoked by uid 500); 9 Aug 2016 07:05:23 -0000 Mailing-List: contact commits-help@airflow.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@airflow.incubator.apache.org Delivered-To: mailing list commits@airflow.incubator.apache.org Received: (qmail 29763 invoked by uid 99); 9 Aug 2016 07:05:23 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2016 07:05:23 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 160411A0C1B for ; Tue, 9 Aug 2016 07:05:23 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -4.646 X-Spam-Level: X-Spam-Status: No, score=-4.646 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-1.426] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id l9j5_X58Y3Tb for ; Tue, 9 Aug 2016 07:05:22 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id A247F5FBC2 for ; Tue, 9 Aug 2016 07:05:21 +0000 (UTC) Received: (qmail 29086 invoked by uid 99); 9 Aug 2016 07:05:20 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 09 Aug 2016 07:05:20 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id 890C42C02A8 for ; Tue, 9 Aug 2016 07:05:20 +0000 (UTC) Date: Tue, 9 Aug 2016 07:05:20 +0000 (UTC) From: "Nadeem Ahmed Nazeer (JIRA)" To: commits@airflow.incubator.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Comment Edited] (AIRFLOW-401) scheduler gets stuck without a trace MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Tue, 09 Aug 2016 07:05:25 -0000 [ https://issues.apache.org/jira/browse/AIRFLOW-401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15413070#comment-15413070 ] Nadeem Ahmed Nazeer edited comment on AIRFLOW-401 at 8/9/16 7:04 AM: --------------------------------------------------------------------- scheduler in a loop for more than 7 hours. screenshot attached. was (Author: nadeem): scheduler in a loop for more than 7 hours! > scheduler gets stuck without a trace > ------------------------------------ > > Key: AIRFLOW-401 > URL: https://issues.apache.org/jira/browse/AIRFLOW-401 > Project: Apache Airflow > Issue Type: Bug > Components: executor, scheduler > Affects Versions: Airflow 1.7.1.3 > Reporter: Nadeem Ahmed Nazeer > Assignee: Bolke de Bruin > Priority: Minor > Attachments: Dag_code.txt, schduler_cpu100%.png, scheduler_stuck.png, scheduler_stuck_7hours.png > > > The scheduler gets stuck without a trace or error. When this happens, the CPU usage of scheduler service is at 100%. No jobs get submitted and everything comes to a halt. Looks it goes into some kind of infinite loop. > The only way I could make it run again is by manually restarting the scheduler service. But again, after running some tasks it gets stuck. I've tried with both Celery and Local executors but same issue occurs. I am using the -n 3 parameter while starting scheduler. > Scheduler configs, > job_heartbeat_sec = 5 > scheduler_heartbeat_sec = 5 > executor = LocalExecutor > parallelism = 32 > Please help. I would be happy to provide any other information needed -- This message was sent by Atlassian JIRA (v6.3.4#6332)