From issues-return-188992-archive-asf-public=cust-asf.ponee.io@spark.apache.org Tue Apr 10 17:01:04 2018 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx-eu-01.ponee.io (Postfix) with SMTP id 6F04018067B for ; Tue, 10 Apr 2018 17:01:04 +0200 (CEST) Received: (qmail 31890 invoked by uid 500); 10 Apr 2018 15:01:03 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 31712 invoked by uid 99); 10 Apr 2018 15:01:03 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 10 Apr 2018 15:01:03 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id ED906C043B for ; Tue, 10 Apr 2018 15:01:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -109.511 X-Spam-Level: X-Spam-Status: No, score=-109.511 tagged_above=-999 required=6.31 tests=[ENV_AND_HDR_SPF_MATCH=-0.5, KAM_ASCII_DIVIDERS=0.8, RCVD_IN_DNSWL_MED=-2.3, SPF_PASS=-0.001, T_RP_MATCHES_RCVD=-0.01, USER_IN_DEF_SPF_WL=-7.5, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id YrXf4Qs_8JWE for ; Tue, 10 Apr 2018 15:01:02 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 780FC5F5DD for ; Tue, 10 Apr 2018 15:01:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 9E870E0BC7 for ; Tue, 10 Apr 2018 15:01:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 5F934241C7 for ; Tue, 10 Apr 2018 15:01:00 +0000 (UTC) Date: Tue, 10 Apr 2018 15:01:00 +0000 (UTC) From: "Attila Zsolt Piros (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-16630) Blacklist a node if executors won't launch on it. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-16630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16432415#comment-16432415 ] Attila Zsolt Piros commented on SPARK-16630: -------------------------------------------- I would need the expiry times to choose the most relevant (most fresh) subset of nodes to backlist when the limit is less then the union of all blacklist-able nodes. So it is only used for sorting. > Blacklist a node if executors won't launch on it. > ------------------------------------------------- > > Key: SPARK-16630 > URL: https://issues.apache.org/jira/browse/SPARK-16630 > Project: Spark > Issue Type: Improvement > Components: YARN > Affects Versions: 1.6.2 > Reporter: Thomas Graves > Priority: Major > > On YARN, its possible that a node is messed or misconfigured such that a container won't launch on it. For instance if the Spark external shuffle handler didn't get loaded on it , maybe its just some other hardware issue or hadoop configuration issue. > It would be nice we could recognize this happening and stop trying to launch executors on it since that could end up causing us to hit our max number of executor failures and then kill the job. -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org