From issues-return-245435-archive-asf-public=cust-asf.ponee.io@spark.apache.org Wed Jan 22 16:40:03 2020 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [207.244.88.153]) by mx-eu-01.ponee.io (Postfix) with SMTP id E0C2C18064C for ; Wed, 22 Jan 2020 17:40:02 +0100 (CET) Received: (qmail 41898 invoked by uid 500); 22 Jan 2020 16:40:02 -0000 Mailing-List: contact issues-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list issues@spark.apache.org Received: (qmail 41886 invoked by uid 99); 22 Jan 2020 16:40:02 -0000 Received: from mailrelay1-us-west.apache.org (HELO mailrelay1-us-west.apache.org) (209.188.14.139) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 22 Jan 2020 16:40:02 +0000 Received: from jira-he-de.apache.org (static.172.67.40.188.clients.your-server.de [188.40.67.172]) by mailrelay1-us-west.apache.org (ASF Mail Server at mailrelay1-us-west.apache.org) with ESMTP id 6A747E30F8 for ; Wed, 22 Jan 2020 16:40:01 +0000 (UTC) Received: from jira-he-de.apache.org (localhost.localdomain [127.0.0.1]) by jira-he-de.apache.org (ASF Mail Server at jira-he-de.apache.org) with ESMTP id 2F4CC780403 for ; Wed, 22 Jan 2020 16:40:00 +0000 (UTC) Date: Wed, 22 Jan 2020 16:40:00 +0000 (UTC) From: "Wei Xue (Jira)" To: issues@spark.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Commented] (SPARK-30528) DPP issues MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SPARK-30528?page=3Dcom.atlassia= n.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=3D170= 21257#comment-17021257 ]=20 Wei Xue commented on SPARK-30528: --------------------------------- # Turning the non-broadcast-reuse DPP by default could be an option. # So it really depends: bloom filter would reduce the partition filtering = rate, thus getting you less benefit, and the penalties are (without extra a= ggregate): ## Duplicate scans or other ops like joins, if the filter plan is partiall= y reused; ## Otherwise, you could implement sth. just like broadcast reuse to fully = reuse the shuffle exchange of the filter plan, but you need to be aware tha= t you are now serializing the two child operators' computation of an SMJ. # Yes, you could. > DPP issues > ---------- > > Key: SPARK-30528 > URL: https://issues.apache.org/jira/browse/SPARK-30528 > Project: Spark > Issue Type: Bug > Components: Optimizer > Affects Versions: 3.0.0 > Reporter: Mayur Bhosale > Priority: Major > Labels: performance > Attachments: cases.png, dup_subquery.png, plan.png > > > In DPP, heuristics to decide if DPP is going to benefit relies on the siz= es of the tables in the right subtree of the join. This might not be a corr= ect estimate especially when the detailed column level stats are not availa= ble. > {code:java} > // the pruning overhead is the total size in bytes of all scan relati= ons > val overhead =3D otherPlan.collectLeaves().map(_.stats.sizeInBytes).s= um.toFloat > filterRatio * partPlan.stats.sizeInBytes.toFloat > overhead.toFloat > {code} > Also, DPP executes the entire right side of the join as a subquery becaus= e of which multiple scans happen for the tables in the right subtree of the= join. This can cause issues when join is non-Broadcast Hash Join (BHJ) and= reuse of the subquery result does not happen. Also, I couldn=E2=80=99t fig= ure out, why do the results from the subquery get re-used only for BHJ? > =C2=A0 > Consider a query, > {code:java} > SELECT *=20 > FROM store_sales_partitioned=20 > JOIN (SELECT *=20 > FROM store_returns_partitioned,=20 > date_dim=20 > WHERE sr_returned_date_sk =3D d_date_sk) ret_date=20 > ON ss_sold_date_sk =3D d_date_sk=20 > WHERE d_fy_quarter_seq > 0=20 > {code} > DPP will kick-in for both the join. (Please check the image plan.png atta= ched below for the plan) > Some of the observations - > * Based on heuristics, DPP would go ahead with pruning if the cost of sc= anning the tables in the right sub-tree of the join is less than the benefi= t due to pruning. This is due to the reason that multiple scans will be nee= ded for an SMJ. But heuristics simply checks if the benefits offset the cos= t of multiple scans and do not take into consideration other operations lik= e Join, etc in the right subtree which can be quite expensive. This issue w= ill be particularly prominent when detailed column level stats are not avai= lable. In the example above, a decision that pruningHasBenefit was made on = the basis of sizes of the tables=C2=A0store_returns_partitioned and date_di= m but did not take into consideration the join between them before the join= happens with the store_sales_partitioned table. > * Multiple scans are needed when the join is SMJ as the reuse of the exc= hanges does not happen. This is because Aggregate gets added on top of the = right subtree to be executed as a subquery in order to prune only required = columns. Here, scanning all the columns as the right subtree of the join wo= uld, and reusing the same exchange might be more helpful as it avoids dupli= cate scans. > This was just a representative example, but in-general for cases such as = in the image cases.png below, DPP can cause performance issues. > =C2=A0 > Also, for the cases when there are multiple DPP compatible join condition= s in the same join, the entire right subtree of the join would be executed = as a subquery that many times. Consider an example, > {code:java} > SELECT=C2=A0*=C2=A0 > FROM=C2=A0=C2=A0=C2=A0partitionedtable=C2=A0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0JOIN=C2=A0nonpartitionedtable= =C2=A0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0ON=C2=A0partcol1=C2= =A0=3D=C2=A0col1=C2=A0 > =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0A= ND=C2=A0partcol2=C2=A0=3D=C2=A0col2=C2=A0 > WHERE=C2=A0=C2=A0nonpartitionedtable.id=C2=A0>=C2=A00=C2=A0 > {code} > Here the right subtree of the join (scan of table=C2=A0nonpartitionedtabl= e) would be executed twice as a subquery, once each for the every join cond= ition. These two subqueries should be aggregated and executed only once as = they are almost the same apart from the columns that they prune. Check the = image dup_subquery.png attached below for the details. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org For additional commands, e-mail: issues-help@spark.apache.org