Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 55323200D3F for ; Sat, 4 Nov 2017 19:13:07 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 53BF21609EE; Sat, 4 Nov 2017 18:13:07 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 98125160BE9 for ; Sat, 4 Nov 2017 19:13:06 +0100 (CET) Received: (qmail 54691 invoked by uid 500); 4 Nov 2017 18:13:05 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 54576 invoked by uid 99); 4 Nov 2017 18:13:05 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd3-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 04 Nov 2017 18:13:05 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd3-us-west.apache.org (ASF Mail Server at spamd3-us-west.apache.org) with ESMTP id 083A81808F4 for ; Sat, 4 Nov 2017 18:13:05 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd3-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -99.201 X-Spam-Level: X-Spam-Status: No, score=-99.201 tagged_above=-999 required=6.31 tests=[KAM_ASCII_DIVIDERS=0.8, RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd3-us-west.apache.org [10.40.0.10]) (amavisd-new, port 10024) with ESMTP id ZBAyDUPB1qbh for ; Sat, 4 Nov 2017 18:13:04 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 797395FB51 for ; Sat, 4 Nov 2017 18:13:03 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 0FEF1E2593 for ; Sat, 4 Nov 2017 18:13:02 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 0C4F6241BC for ; Sat, 4 Nov 2017 18:13:01 +0000 (UTC) Date: Sat, 4 Nov 2017 18:13:01 +0000 (UTC) From: "Xiao Li (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Reopened] (SPARK-22411) Heuristic to combine splits in DataSourceScanExec isn't accurate when dynamic allocation is enabled MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Sat, 04 Nov 2017 18:13:07 -0000 [ https://issues.apache.org/jira/browse/SPARK-22411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reopened SPARK-22411: ----------------------------- > Heuristic to combine splits in DataSourceScanExec isn't accurate when dynamic allocation is enabled > --------------------------------------------------------------------------------------------------- > > Key: SPARK-22411 > URL: https://issues.apache.org/jira/browse/SPARK-22411 > Project: Spark > Issue Type: Bug > Components: SQL > Affects Versions: 2.0.0 > Reporter: Vinitha Reddy Gankidi > Assignee: Vinitha Reddy Gankidi > Priority: Major > Fix For: 2.3.0 > > > The heuristic to calculate the maxSplitSize in DataSourceScanExec is as follows: > https://github.com/apache/spark/blob/d28d5732ae205771f1f443b15b10e64dcffb5ff0/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L431 > Default parallelism in this case is the number of total cores of all the registered executors for this application. This works well with static allocation but with dynamic allocation enabled, this value is usually one (with default config of min and initial executors as zero) at the time of split calculation. This heuristic was introduced in SPARK-14582. > When Dynamic allocation it is confusing to tune the split size with this heuristic. It is better to ignore bytesPerCore and use the values of 'spark.sql.files.maxPartitionBytes' as the max split size. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org