Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 591DD200D66 for ; Fri, 15 Dec 2017 06:30:11 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 57AA3160C25; Fri, 15 Dec 2017 05:30:11 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 9D450160C16 for ; Fri, 15 Dec 2017 06:30:10 +0100 (CET) Received: (qmail 71987 invoked by uid 500); 15 Dec 2017 05:30:09 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 71978 invoked by uid 99); 15 Dec 2017 05:30:09 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Dec 2017 05:30:09 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 121501A0E9D for ; Fri, 15 Dec 2017 05:30:08 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-us.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id MdCDTrwFTaG8 for ; Fri, 15 Dec 2017 05:30:07 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-us.apache.org (ASF Mail Server at mx1-lw-us.apache.org) with ESMTP id A94B85F635 for ; Fri, 15 Dec 2017 05:30:06 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 4228BE015F for ; Fri, 15 Dec 2017 05:30:06 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 31776212FE for ; Fri, 15 Dec 2017 05:30:04 +0000 (UTC) Date: Fri, 15 Dec 2017 05:30:04 +0000 (UTC) From: "Xuefu Zhang (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-22765) Create a new executor allocation scheme based on that of MR MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Fri, 15 Dec 2017 05:30:11 -0000 [ https://issues.apache.org/jira/browse/SPARK-22765?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16292048#comment-16292048 ] Xuefu Zhang commented on SPARK-22765: ------------------------------------- [~tgraves], I think it would help if SPARK-21656 can make a close-to-zero idle time work. This is one source of inefficiency. Our version is too old to backport the fix, but will try out this when we upgrade. The second source of inefficiency comes in the fact that Spark favors bigger containers. A 4-core container might be running one task while wasting the other cores/mem. The executor cannot die as long as there is one task running. One might argue that a user configures 1-core containers under dynamic allocation. but this is probably not optimal on other aspects. The third reason that one might favor MR-styled scheduling is its simplicity and efficiency. Frequently we found that for heavy workload the scheduler cannot really keep up with the task ups and downs, especially when the tasks finish fast. For cost-conscious users, cluster-level resource efficiency is probably what's looked at. My suspicion is that an enhanced MR-styled scheduling, simple and performing, will be significantly improve resource efficiency than a typical use of dynamic allocation, without sacrificing much performance. As a start point, we will first benchmark with SPARK-21656 when possible. > Create a new executor allocation scheme based on that of MR > ----------------------------------------------------------- > > Key: SPARK-22765 > URL: https://issues.apache.org/jira/browse/SPARK-22765 > Project: Spark > Issue Type: Improvement > Components: Scheduler > Affects Versions: 1.6.0 > Reporter: Xuefu Zhang > > Many users migrating their workload from MR to Spark find a significant resource consumption hike (i.e, SPARK-22683). While this might not be a concern for users that are more performance centric, for others conscious about cost, such hike creates a migration obstacle. This situation can get worse as more users are moving to cloud. > Dynamic allocation make it possible for Spark to be deployed in multi-tenant environment. With its performance-centric design, its inefficiency has also unfortunately shown up, especially when compared with MR. Thus, it's believed that MR-styled scheduler still has its merit. Based on our research, the inefficiency associated with dynamic allocation comes in many aspects such as executor idling out, bigger executors, many stages (rather than 2 stages only in MR) in a spark job, etc. > Rather than fine tuning dynamic allocation for efficiency, the proposal here is to add a new, efficiency-centric scheduling scheme based on that of MR. Such a MR-based scheme can be further enhanced and be more adapted to Spark execution model. This alternative is expected to offer good performance improvement (compared to MR) still with similar to or even better efficiency than MR. > Inputs are greatly welcome! -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org