spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Eric Liang (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SPARK-18814) CheckAnalysis rejects TPCDS query 32
Date Sat, 10 Dec 2016 03:53:58 GMT

    [ https://issues.apache.org/jira/browse/SPARK-18814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15737148#comment-15737148
] 

Eric Liang commented on SPARK-18814:
------------------------------------

It seems that the references of an Alias expression should include the
referenced attribute, so I would expect #39 to still show up. I could be
misunderstanding the behavior of Alias though.

On Fri, Dec 9, 2016, 7:50 PM Nattavut Sutyanyong (JIRA) <jira@apache.org>



> CheckAnalysis rejects TPCDS query 32
> ------------------------------------
>
>                 Key: SPARK-18814
>                 URL: https://issues.apache.org/jira/browse/SPARK-18814
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.1.0
>            Reporter: Eric Liang
>            Priority: Blocker
>
> It seems the CheckAnalysis rule introduced by SPARK-18504 is incorrect rejecting this
TPCDS query, which ran fine in Spark 2.0. There doesn't seem to be any obvious error in the
query or the check rule though: in the plan below, the scalar subquery's condition field is
"scalar-subquery#24 [(cs_item_sk#39#111 = i_item_sk#59)] ", which should reference cs_item_sk#39.
Nonetheless CheckAnalysis complains that cs_item_sk#39 is not referenced by the scalar subquery
predicates.
> analysis error:
> {code}
> == Query: q32-v1.4 ==
>  Can't be analyzed: org.apache.spark.sql.AnalysisException: a GROUP BY clause in a scalar
correlated subquery cannot contain non-correlated columns: cs_item_sk#39;;
> GlobalLimit 100
> +- LocalLimit 100
>    +- Aggregate [sum(cs_ext_discount_amt#46) AS excess discount amount#23]
>       +- Filter ((((i_manufact_id#72 = 977) && (i_item_sk#59 = cs_item_sk#39))
&& ((d_date#83 >= 2000-01-27) && (d_date#83 <= cast(cast(cast(cast(2000-01-27
as date) as timestamp) + interval 12 weeks 6 days as date) as string)))) && ((d_date_sk#81
= cs_sold_date_sk#58) && (cast(cs_ext_discount_amt#46 as decimal(14,7)) > cast(scalar-subquery#24
[(cs_item_sk#39#111 = i_item_sk#59)] as decimal(14,7)))))
>          :  +- Project [(CAST(1.3 AS DECIMAL(11,6)) * CAST(avg(cs_ext_discount_amt) AS
DECIMAL(11,6)))#110, cs_item_sk#39 AS cs_item_sk#39#111]
>          :     +- Aggregate [cs_item_sk#39], [CheckOverflow((promote_precision(cast(1.3
as decimal(11,6))) * promote_precision(cast(avg(cs_ext_discount_amt#46) as decimal(11,6)))),
DecimalType(14,7)) AS (CAST(1.3 AS DECIMAL(11,6)) * CAST(avg(cs_ext_discount_amt) AS DECIMAL(11,6)))#110,
cs_item_sk#39]
>          :        +- Filter (((d_date#83 >= 2000-01-27]) && (d_date#83 <=
cast(cast(cast(cast(2000-01-27 as date) as timestamp) + interval 12 weeks 6 days as date)
as string))) && (d_date_sk#81 = cs_sold_date_sk#58))
>          :           +- Join Inner
>          :              :- SubqueryAlias catalog_sales
>          :              :  +- Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,...
10 more fields] parquet
>          :              +- SubqueryAlias date_dim
>          :                 +- Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,...
4 more fields] parquet
>          +- Join Inner
>             :- Join Inner
>             :  :- SubqueryAlias catalog_sales
>             :  :  +- Relation[cs_sold_time_sk#25,cs_ship_date_sk#26,cs_bill_customer_sk#27,cs_bill_cdemo_sk#28,cs_bill_hdemo_sk#29,cs_bill_addr_sk#30,cs_ship_customer_sk#31,cs_ship_cdemo_sk#32,cs_ship_hdemo_sk#33,cs_ship_addr_sk#34,cs_call_center_sk#35,cs_catalog_page_sk#36,cs_ship_mode_sk#37,cs_warehouse_sk#38,cs_item_sk#39,cs_promo_sk#40,cs_order_number#41,cs_quantity#42,cs_wholesale_cost#43,cs_list_price#44,cs_sales_price#45,cs_ext_discount_amt#46,cs_ext_sales_price#47,cs_ext_wholesale_cost#48,...
10 more fields] parquet
>             :  +- SubqueryAlias item
>             :     +- Relation[i_item_sk#59,i_item_id#60,i_rec_start_date#61,i_rec_end_date#62,i_item_desc#63,i_current_price#64,i_wholesale_cost#65,i_brand_id#66,i_brand#67,i_class_id#68,i_class#69,i_category_id#70,i_category#71,i_manufact_id#72,i_manufact#73,i_size#74,i_formulation#75,i_color#76,i_units#77,i_container#78,i_manager_id#79,i_product_name#80]
parquet
>             +- SubqueryAlias date_dim
>                +- Relation[d_date_sk#81,d_date_id#82,d_date#83,d_month_seq#84,d_week_seq#85,d_quarter_seq#86,d_year#87,d_dow#88,d_moy#89,d_dom#90,d_qoy#91,d_fy_year#92,d_fy_quarter_seq#93,d_fy_week_seq#94,d_day_name#95,d_quarter_name#96,d_holiday#97,d_weekend#98,d_following_holiday#99,d_first_dom#100,d_last_dom#101,d_same_day_ly#102,d_same_day_lq#103,d_current_day#104,...
4 more fields] parquet
> {code}
> query text:
> {code}
> select sum(cs_ext_discount_amt) as `excess discount amount`
>  from
>     catalog_sales, item, date_dim
>  where
>    i_manufact_id = 977
>    and i_item_sk = cs_item_sk
>    and d_date between '2000-01-27' and (cast('2000-01-27' as date) + interval 90 days)
>    and d_date_sk = cs_sold_date_sk
>    and cs_ext_discount_amt > (
>           select 1.3 * avg(cs_ext_discount_amt)
>           from catalog_sales, date_dim
>           where cs_item_sk = i_item_sk
>            and d_date between '2000-01-27]' and (cast('2000-01-27' as date) + interval
90 days)
>            and d_date_sk = cs_sold_date_sk)
> limit 100
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message