Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 989AB200CDA for ; Fri, 21 Jul 2017 01:23:03 +0200 (CEST) Received: by cust-asf.ponee.io (Postfix) id 9731116C53C; Thu, 20 Jul 2017 23:23:03 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id DDB4E16C53A for ; Fri, 21 Jul 2017 01:23:02 +0200 (CEST) Received: (qmail 16710 invoked by uid 500); 20 Jul 2017 23:23:02 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 16699 invoked by uid 99); 20 Jul 2017 23:23:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 20 Jul 2017 23:23:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id A29E3C02AA for ; Thu, 20 Jul 2017 23:23:01 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.202 X-Spam-Level: X-Spam-Status: No, score=-99.202 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id eED55Y8a3NrR for ; Thu, 20 Jul 2017 23:23:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id BFF195F2EC for ; Thu, 20 Jul 2017 23:23:00 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5E5ACE02AB for ; Thu, 20 Jul 2017 23:23:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 1CF4821EDB for ; Thu, 20 Jul 2017 23:23:00 +0000 (UTC) Date: Thu, 20 Jul 2017 23:23:00 +0000 (UTC) From: "Thomas Graves (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-21460) Spark dynamic allocation breaks when ListenerBus event queue runs full MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Thu, 20 Jul 2017 23:23:03 -0000 [ https://issues.apache.org/jira/browse/SPARK-21460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16095546#comment-16095546 ] Thomas Graves commented on SPARK-21460: --------------------------------------- I didn't think that was the case, but took a look at the code and I guess I was wrong, it definitely appears to be reliant on the listener bus. That is really bad in my opinion. We are intentionally dropping events and we know that will cause issues. > Spark dynamic allocation breaks when ListenerBus event queue runs full > ---------------------------------------------------------------------- > > Key: SPARK-21460 > URL: https://issues.apache.org/jira/browse/SPARK-21460 > Project: Spark > Issue Type: Bug > Components: Scheduler, YARN > Affects Versions: 2.0.0, 2.0.2, 2.1.0, 2.1.1, 2.2.0 > Environment: Spark 2.1 > Hadoop 2.6 > Reporter: Ruslan Dautkhanov > Priority: Critical > Labels: dynamic_allocation, performance, scheduler, yarn > > When ListenerBus event queue runs full, spark dynamic allocation stops working - Spark fails to shrink number of executors when there are no active jobs (Spark driver "thinks" there are active jobs since it didn't capture when they finished) . > ps. What's worse it also makes Spark flood YARN RM with reservation requests, so YARN preemption doesn't function properly too (we're on Spark 2.1 / Hadoop 2.6). -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org