Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 23AA1200CDD for ; Sun, 23 Jul 2017 17:25:07 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 22114164845; Sun, 23 Jul 2017 15:25:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 6F44616484D for ; Sun, 23 Jul 2017 17:25:06 +0200 (CEST) Received: (qmail 14853 invoked by uid 500); 23 Jul 2017 15:25:05 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 14633 invoked by uid 99); 23 Jul 2017 15:25:04 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 23 Jul 2017 15:25:04 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id D34671A0371 for ; Sun, 23 Jul 2017 15:25:03 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id FUlOL384vAoZ for ; Sun, 23 Jul 2017 15:25:03 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A68E25FD96 for ; Sun, 23 Jul 2017 15:25:02 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 66249E0E08 for ; Sun, 23 Jul 2017 15:25:01 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 8141D21EEC for ; Sun, 23 Jul 2017 15:25:00 +0000 (UTC) Date: Sun, 23 Jul 2017 15:25:00 +0000 (UTC) From: "Wenchen Fan (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Resolved] (SPARK-20904) Task failures during shutdown cause problems with preempted executors MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sun, 23 Jul 2017 15:25:07 -0000 [ https://issues.apache.org/jira/browse/SPARK-20904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-20904. --------------------------------- Resolution: Fixed Fix Version/s: 2.3.0 2.2.1 Issue resolved by pull request 18594 [https://github.com/apache/spark/pull/18594] > Task failures during shutdown cause problems with preempted executors > --------------------------------------------------------------------- > > Key: SPARK-20904 > URL: https://issues.apache.org/jira/browse/SPARK-20904 > Project: Spark > Issue Type: Bug > Components: Spark Core, YARN > Affects Versions: 1.6.0 > Reporter: Marcelo Vanzin > Fix For: 2.2.1, 2.3.0 > > > Spark runs tasks in a thread pool that uses daemon threads in each executor. That means that when the JVM gets a signal to shut down, those tasks keep running. > Now when YARN preempts an executor, it sends a SIGTERM to the process, triggering the JVM shutdown. That causes shutdown hooks to run which may cause user code running in those tasks to fail, and report task failures to the driver. Those failures are then counted towards the maximum number of allowed failures, even though in this case we don't want that because the executor was preempted. > So we need a better way to handle that situation. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org