From reviews-return-1238037-archive-asf-public=cust-asf.ponee.io@spark.apache.org Fri Jan 15 06:36:13 2021 Return-Path: X-Original-To: archive-asf-public@cust-asf.ponee.io Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mxout1-ec2-va.apache.org (mxout1-ec2-va.apache.org [3.227.148.255]) by mx-eu-01.ponee.io (Postfix) with ESMTPS id 1B8E5180654 for ; Fri, 15 Jan 2021 07:36:13 +0100 (CET) Received: from mail.apache.org (mailroute1-lw-us.apache.org [207.244.88.153]) by mxout1-ec2-va.apache.org (ASF Mail Server at mxout1-ec2-va.apache.org) with SMTP id 5758F41C14 for ; Fri, 15 Jan 2021 06:36:12 +0000 (UTC) Received: (qmail 52169 invoked by uid 500); 15 Jan 2021 06:36:12 -0000 Mailing-List: contact reviews-help@spark.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Delivered-To: mailing list reviews@spark.apache.org Received: (qmail 52156 invoked by uid 99); 15 Jan 2021 06:36:12 -0000 Received: from ec2-52-202-80-70.compute-1.amazonaws.com (HELO gitbox.apache.org) (52.202.80.70) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 15 Jan 2021 06:36:12 +0000 From: =?utf-8?q?GitBox?= To: reviews@spark.apache.org Subject: =?utf-8?q?=5BGitHub=5D_=5Bspark=5D_wangyum_opened_a_new_pull_request_=233119?= =?utf-8?q?3=3A_=5BSPARK-34117=5D=5BSQL=5D_Disable_LeftSemi/LeftAnti_push_do?= =?utf-8?q?wn_over_Aggregate?= Message-ID: Date: Fri, 15 Jan 2021 06:36:12 -0000 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit wangyum opened a new pull request #31193: URL: https://github.com/apache/spark/pull/31193 ### What changes were proposed in this pull request? This pr disable LeftSemi/LeftAnti push down over Aggregate. ### Why are the changes needed? LeftSemi/LeftAnti push down over Aggregate may affect performance. for example: ```sql SELECT i_item_sk ss_item_sk FROM item, (SELECT distinct iss.i_brand_id brand_id, iss.i_class_id class_id, iss.i_category_id category_id FROM store_sales, item iss, date_dim d1 WHERE ss_item_sk = iss.i_item_sk AND ss_sold_date_sk = d1.d_date_sk AND d1.d_year BETWEEN 1999 AND 1999 + 2 INTERSECT SELECT distinct ics.i_brand_id, ics.i_class_id, ics.i_category_id FROM catalog_sales, item ics, date_dim d2 WHERE cs_item_sk = ics.i_item_sk AND cs_sold_date_sk = d2.d_date_sk AND d2.d_year BETWEEN 1999 AND 1999 + 2 INTERSECT SELECT distinct iws.i_brand_id, iws.i_class_id, iws.i_category_id FROM web_sales, item iws, date_dim d3 WHERE ws_item_sk = iws.i_item_sk AND ws_sold_date_sk = d3.d_date_sk AND d3.d_year BETWEEN 1999 AND 1999 + 2) x WHERE i_brand_id = brand_id AND i_class_id = class_id AND i_category_id = category_id; ``` This query is rewritten from [q14b](https://github.com/apache/spark/blob/a78d6ce376edf2a8836e01f47b9dff5371058d4c/sql/core/src/test/resources/tpcds/q14b.sql#L2-L32). CBO enabled | CBO disabled -- | -- The statistics are more accurate, so it will be converted to broadcast join, and then push down aggregate. | It will not push down aggregate. ![image](https://issues.apache.org/jira/secure/attachment/13018801/current.jpg) | ![image](https://issues.apache.org/jira/secure/attachment/13018802/disable_pushdown.jpg) ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Unit test. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: users@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org For additional commands, e-mail: reviews-help@spark.apache.org