Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 6AF3D200D2F for ; Wed, 1 Nov 2017 20:13:04 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 69859160BFA; Wed, 1 Nov 2017 19:13:04 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id B00851609EC for ; Wed, 1 Nov 2017 20:13:03 +0100 (CET) Received: (qmail 92056 invoked by uid 500); 1 Nov 2017 19:13:02 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 92041 invoked by uid 99); 1 Nov 2017 19:13:02 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd2-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 01 Nov 2017 19:13:02 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd2-us-west.apache.org (ASF Mail Server at spamd2-us-west.apache.org) with ESMTP id 31B111A0CE1 for ; Wed, 1 Nov 2017 19:13:02 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd2-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -100.002 X-Spam-Level: X-Spam-Status: No, score=-100.002 tagged_above=-999 required=6.31 tests=[RP_MATCHES_RCVD=-0.001, SPF_PASS=-0.001, USER_IN_WHITELIST=-100] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd2-us-west.apache.org [10.40.0.9]) (amavisd-new, port 10024) with ESMTP id Yaq1RYFcacyP for ; Wed, 1 Nov 2017 19:13:01 +0000 (UTC) Received: from mailrelay1-us-west.apache.org (mailrelay1-us-west.apache.org [209.188.14.139]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with ESMTP id 195D75FDD4 for ; Wed, 1 Nov 2017 19:13:01 +0000 (UTC) Received: from jira-lw-us.apache.org (unknown [207.244.88.139]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 5648DE00DF for ; Wed, 1 Nov 2017 19:13:00 +0000 (UTC) Received: from jira-lw-us.apache.org (localhost [127.0.0.1]) by jira-lw-us.apache.org (ASF Mail Server at jira-lw-us.apache.org) with ESMTP id 196F5212F6 for ; Wed, 1 Nov 2017 19:13:00 +0000 (UTC) Date: Wed, 1 Nov 2017 19:13:00 +0000 (UTC) From: "Vinitha Reddy Gankidi (JIRA)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Created] (SPARK-22411) Heuristic to combine splits in DataSourceScanExec isn't accurate when dynamic allocation is enabled MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 archived-at: Wed, 01 Nov 2017 19:13:04 -0000 Vinitha Reddy Gankidi created SPARK-22411: --------------------------------------------- Summary: Heuristic to combine splits in DataSourceScanExec isn't accurate when dynamic allocation is enabled Key: SPARK-22411 URL: https://issues.apache.org/jira/browse/SPARK-22411 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.0.0 Reporter: Vinitha Reddy Gankidi Priority: Major The heuristic to calculate the maxSplitSize in DataSourceScanExec is as follows: https://github.com/apache/spark/blob/d28d5732ae205771f1f443b15b10e64dcffb5ff0/sql/core/src/main/scala/org/apache/spark/sql/execution/DataSourceScanExec.scala#L431 Default parallelism in this case is the number of total cores of all the registered executors for this application. This works well with static allocation but with dynamic allocation enabled, this value is usually one (with default config of min and initial executors as zero) at the time of split calculation. This heuristic was introduced in SPARK-14582. When Dynamic allocation it is confusing to tune the split size with this heuristic. It is better to ignore bytesPerCore and use the values of 'spark.sql.files.maxPartitionBytes' as the max split size. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org