hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mostafa Mokhtar (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-8767) CBO : Join on inequality results in in-correct join order
Date Thu, 06 Nov 2014 22:15:33 GMT
Mostafa Mokhtar created HIVE-8767:
-------------------------------------

             Summary: CBO : Join on inequality results in in-correct join order
                 Key: HIVE-8767
                 URL: https://issues.apache.org/jira/browse/HIVE-8767
             Project: Hive
          Issue Type: Bug
          Components: CBO
    Affects Versions: 0.14.0
            Reporter: Mostafa Mokhtar
            Assignee: Laljo John Pullokkaran
             Fix For: 0.15.0


Queries with hybrid joins (inner join and in-equality join) produce inefficient join order.
CBO joins the two tables involved in the in-equality join first then the remaining joins are
considered.
The problem with that selectivity of the other joins is not taken into consideration.

Queries that are affected by this are Q64 and Q72 from TPC-DS

Logical plan for Q72
{code}
2014-11-06 14:13:12,169 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631))
- Plan After Join Reordering:
HiveSortRel(fetch=[100]): rowcount = 139827.8175849229, cumulative cost = {7.195499709572942E13
rows, 279655.6351698458 cpu, 0.0 io}, id = 2037
  HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC],
dir3=[ASC]): rowcount = 139827.8175849229, cumulative cost = {7.195497058847592E13 rows, 139827.8175849229
cpu, 0.0 io}, id = 2035
    HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]):
rowcount = 139827.8175849229, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io},
id = 2033
      HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 139827.8175849229,
cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2031
        HiveProjectRel($f0=[$13], $f1=[$11], $f2=[$20]): rowcount = 106451.860966184, cumulative
cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2029
          HiveFilterRel(condition=[>(CAST($25):DOUBLE, +(CAST($19):DOUBLE, CAST(5):DOUBLE))]):
rowcount = 106451.860966184, cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io},
id = 2027
            HiveProjectRel(cs_ship_date_sk=[$2], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4],
cs_item_sk=[$5], cs_quantity=[$6], cs_sold_date_sk=[$7], inv_item_sk=[$8], inv_warehouse_sk=[$9],
inv_quantity_on_hand=[$10], inv_date_sk=[$11], w_warehouse_sk=[$24], w_warehouse_name=[$25],
i_item_sk=[$0], i_item_desc=[$1], cd_demo_sk=[$20], cd_marital_status=[$21], hd_demo_sk=[$18],
hd_buy_potential=[$19], d_date_sk=[$12], d_date=[$13], d_week_seq=[$14], d_year=[$15], d_date_sk0=[$16],
d_week_seq0=[$17], d_date_sk1=[$22], d_date0=[$23]): rowcount = 319355.582898552, cumulative
cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2451
              HiveJoinRel(condition=[=($24, $9)], joinType=[inner]): rowcount = 319355.582898552,
cumulative cost = {7.195494408122242E13 rows, 0.0 cpu, 0.0 io}, id = 2449
                HiveJoinRel(condition=[=($2, $22)], joinType=[inner]): rowcount = 319355.582898552,
cumulative cost = {7.195494376183984E13 rows, 0.0 cpu, 0.0 io}, id = 2447
                  HiveJoinRel(condition=[=($0, $5)], joinType=[inner]): rowcount = 319355.582898552,
cumulative cost = {7.195494336943527E13 rows, 0.0 cpu, 0.0 io}, id = 2445
                    HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1997
                      HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount
= 462000.0, cumulative cost = {0}, id = 1645
                    HiveJoinRel(condition=[=($1, $18)], joinType=[inner]): rowcount = 319355.582898552,
cumulative cost = {7.195494258807969E13 rows, 0.0 cpu, 0.0 io}, id = 2443
                      HiveJoinRel(condition=[=($2, $16)], joinType=[inner]): rowcount = 2235489.080289864,
cumulative cost = {7.195494007819061E13 rows, 0.0 cpu, 0.0 io}, id = 2441
                        HiveJoinRel(condition=[AND(=($9, $14), =($12, $15))], joinType=[inner]):
rowcount = 5.588722700724658E7, cumulative cost = {7.195488419067561E13 rows, 0.0 cpu, 0.0
io}, id = 2439
                          HiveJoinRel(condition=[=($5, $10)], joinType=[inner]): rowcount
= 5.732184228903609E9, cumulative cost = {7.19491519333977E13 rows, 0.0 cpu, 0.0 io}, id =
2080
                            HiveFilterRel(condition=[<($8, $4)]): rowcount = 7.190451896736688E13,
cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 1991
                              HiveProjectRel(cs_ship_date_sk=[$0], cs_bill_cdemo_sk=[$1],
cs_bill_hdemo_sk=[$2], cs_item_sk=[$3], cs_quantity=[$4], cs_sold_date_sk=[$5], inv_item_sk=[$6],
inv_warehouse_sk=[$7], inv_quantity_on_hand=[$8], inv_date_sk=[$9]): rowcount = 2.1571355690210062E14,
cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 2072
                                HiveJoinRel(condition=[=($3, $6)], joinType=[inner]): rowcount
= 2.1571355690210062E14, cumulative cost = {4.4632966025E10 rows, 0.0 cpu, 0.0 io}, id = 2067
                                  HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3],
cs_bill_hdemo_sk=[$4], cs_item_sk=[$14], cs_quantity=[$17], cs_sold_date_sk=[$33]): rowcount
= 4.3005109025E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1925
                                    HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]):
rowcount = 4.3005109025E10, cumulative cost = {0}, id = 1642
                                  HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1],
inv_quantity_on_hand=[$2], inv_date_sk=[$3]): rowcount = 1.627857E9, cumulative cost = {0.0
rows, 0.0 cpu, 0.0 io}, id = 1928
                                    HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]):
rowcount = 1.627857E9, cumulative cost = {0}, id = 1644
                            HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4], d_year=[$6]):
rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2015
                              HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]): rowcount
= 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2013
                                HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]):
rowcount = 73049.0, cumulative cost = {0}, id = 1640
                          HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2019
                            HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]):
rowcount = 73049.0, cumulative cost = {0}, id = 1640
                        HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount =
288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2009
                          HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]):
rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2007
                            HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]):
rowcount = 7200.0, cumulative cost = {0}, id = 1639
                      HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount =
274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2003
                        HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0, cumulative
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2001
                          HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]):
rowcount = 1920800.0, cumulative cost = {0}, id = 1646
                  HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2023
                    HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]):
rowcount = 73049.0, cumulative cost = {0}, id = 1640
                HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount = 27.0,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 1993
                  HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]): rowcount
= 27.0, cumulative cost = {0}, id = 1647
{code}

Logical plan for Q72 if the join order in the query is changed
{code}
2014-11-06 14:14:54,469 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631))
- Plan After Join Reordering:
HiveSortRel(fetch=[100]): rowcount = 139827.81758492283, cumulative cost = {4.472416182114529E10
rows, 279655.63516984566 cpu, 0.0 io}, id = 2860
  HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC],
dir3=[ASC]): rowcount = 139827.81758492283, cumulative cost = {4.469765456764418E10 rows,
139827.81758492283 cpu, 0.0 io}, id = 2858
    HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]):
rowcount = 139827.81758492283, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0
io}, id = 2856
      HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 139827.81758492283,
cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2854
        HiveProjectRel($f0=[$21], $f1=[$19], $f2=[$8]): rowcount = 106451.86096618396, cumulative
cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 2852
          HiveFilterRel(condition=[>(CAST($25):DOUBLE, +(CAST($7):DOUBLE, CAST(5):DOUBLE))]):
rowcount = 106451.86096618396, cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0
io}, id = 2850
            HiveProjectRel(cs_ship_date_sk=[$2], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4],
cs_item_sk=[$5], cs_quantity=[$6], cs_sold_date_sk=[$7], d_date_sk=[$8], d_date=[$9], d_week_seq=[$10],
d_year=[$11], cd_demo_sk=[$12], cd_marital_status=[$13], hd_demo_sk=[$14], hd_buy_potential=[$15],
inv_item_sk=[$16], inv_warehouse_sk=[$17], inv_quantity_on_hand=[$18], inv_date_sk=[$19],
w_warehouse_sk=[$24], w_warehouse_name=[$25], i_item_sk=[$0], i_item_desc=[$1], d_date_sk0=[$20],
d_week_seq0=[$21], d_date_sk1=[$22], d_date0=[$23]): rowcount = 319355.5828985519, cumulative
cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 3081
              HiveJoinRel(condition=[=($24, $17)], joinType=[inner]): rowcount = 319355.5828985519,
cumulative cost = {4.4671147314143074E10 rows, 0.0 cpu, 0.0 io}, id = 3079
                HiveJoinRel(condition=[=($2, $22)], joinType=[inner]): rowcount = 319355.5828985519,
cumulative cost = {4.467082793156017E10 rows, 0.0 cpu, 0.0 io}, id = 3077
                  HiveJoinRel(condition=[=($0, $5)], joinType=[inner]): rowcount = 319355.5828985519,
cumulative cost = {4.467043552697727E10 rows, 0.0 cpu, 0.0 io}, id = 3075
                    HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2838
                      HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]): rowcount
= 462000.0, cumulative cost = {0}, id = 2462
                    HiveJoinRel(condition=[AND(=($17, $18), =($8, $19))], joinType=[inner]):
rowcount = 319355.5828985519, cumulative cost = {4.466965417139437E10 rows, 0.0 cpu, 0.0 io},
id = 2991
                      HiveFilterRel(condition=[<($16, $4)]): rowcount = 3.2755338450877763E7,
cumulative cost = {4.46368257839435E10 rows, 0.0 cpu, 0.0 io}, id = 2832
                        HiveProjectRel(cs_ship_date_sk=[$6], cs_bill_cdemo_sk=[$7], cs_bill_hdemo_sk=[$8],
cs_item_sk=[$9], cs_quantity=[$10], cs_sold_date_sk=[$11], d_date_sk=[$12], d_date=[$13],
d_week_seq=[$14], d_year=[$15], cd_demo_sk=[$4], cd_marital_status=[$5], hd_demo_sk=[$16],
hd_buy_potential=[$17], inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2],
inv_date_sk=[$3]): rowcount = 9.82660153526333E7, cumulative cost = {4.46368257839435E10 rows,
0.0 cpu, 0.0 io}, id = 2987
                          HiveJoinRel(condition=[=($9, $0)], joinType=[inner]): rowcount =
9.82660153526333E7, cumulative cost = {4.46368257839435E10 rows, 0.0 cpu, 0.0 io}, id = 2985
                            HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2],
inv_date_sk=[$3]): rowcount = 1.627857E9, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id
= 2828
                              HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]):
rowcount = 1.627857E9, cumulative cost = {0}, id = 2461
                            HiveJoinRel(condition=[=($3, $0)], joinType=[inner]): rowcount
= 19590.519781796647, cumulative cost = {4.300894919342371E10 rows, 0.0 cpu, 0.0 io}, id =
2983
                              HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount
= 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2818
                                HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2816
                                  HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]):
rowcount = 1920800.0, cumulative cost = {0}, id = 2463
                              HiveJoinRel(condition=[=($2, $10)], joinType=[inner]): rowcount
= 137133.63847257654, cumulative cost = {4.300853765978524E10 rows, 0.0 cpu, 0.0 io}, id =
2981
                                HiveJoinRel(condition=[=($5, $6)], joinType=[inner]): rowcount
= 3428340.961814413, cumulative cost = {4.3005109030823425E10 rows, 0.0 cpu, 0.0 io}, id =
2901
                                  HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3],
cs_bill_hdemo_sk=[$4], cs_item_sk=[$14], cs_quantity=[$17], cs_sold_date_sk=[$33]): rowcount
= 4.3005109025E10, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2808
                                    HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]):
rowcount = 4.3005109025E10, cumulative cost = {0}, id = 2459
                                  HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4],
d_year=[$6]): rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io},
id = 2812
                                    HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]):
rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2810
                                      HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]):
rowcount = 73049.0, cumulative cost = {0}, id = 2457
                                HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount
= 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2824
                                  HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]):
rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2822
                                    HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]):
rowcount = 7200.0, cumulative cost = {0}, id = 2456
                      HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2842
                        HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]):
rowcount = 73049.0, cumulative cost = {0}, id = 2457
                  HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2846
                    HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]):
rowcount = 73049.0, cumulative cost = {0}, id = 2457
                HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount = 27.0,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 2834
                  HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]): rowcount
= 27.0, cumulative cost = {0}, id = 2464
{code}

Logical plan after replacing "inv_quantity_on_hand < cs_quantity"  with inv_quantity_on_hand
= 528
{code}
2014-11-06 14:18:17,149 DEBUG [main]: parse.SemanticAnalyzer (SemanticAnalyzer.java:apply(12631))
- Plan After Join Reordering:
HiveSortRel(fetch=[100]): rowcount = 863.6667057213618, cumulative cost = {4.301228282853069E10
rows, 1727.3334114427237 cpu, 0.0 io}, id = 4382
  HiveSortRel(sort0=[$3], sort1=[$0], sort2=[$1], sort3=[$2], dir0=[DESC], dir1=[ASC], dir2=[ASC],
dir3=[ASC]): rowcount = 863.6667057213618, cumulative cost = {4.301218939793791E10 rows, 863.6667057213618
cpu, 0.0 io}, id = 4380
    HiveProjectRel(i_item_desc=[$0], w_warehouse_name=[$1], d_week_seq=[$2], total_cnt=[$3]):
rowcount = 863.6667057213618, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io},
id = 4378
      HiveAggregateRel(group=[{0, 1, 2}], agg#0=[count()]): rowcount = 863.6667057213618,
cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4376
        HiveProjectRel($f0=[$12], $f1=[$10], $f2=[$19]): rowcount = 462.4898665205019, cumulative
cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4374
          HiveFilterRel(condition=[>(CAST($24):DOUBLE, +(CAST($18):DOUBLE, CAST(5):DOUBLE))]):
rowcount = 462.4898665205019, cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io},
id = 4372
            HiveProjectRel(cs_ship_date_sk=[$12], cs_bill_cdemo_sk=[$13], cs_bill_hdemo_sk=[$14],
cs_item_sk=[$15], cs_sold_date_sk=[$16], inv_item_sk=[$2], inv_warehouse_sk=[$3], inv_quantity_on_hand=[$4],
inv_date_sk=[$5], w_warehouse_sk=[$6], w_warehouse_name=[$7], i_item_sk=[$8], i_item_desc=[$9],
cd_demo_sk=[$10], cd_marital_status=[$11], hd_demo_sk=[$21], hd_buy_potential=[$22], d_date_sk=[$17],
d_date=[$18], d_week_seq=[$19], d_year=[$20], d_date_sk0=[$23], d_week_seq0=[$24], d_date_sk1=[$0],
d_date0=[$1]): rowcount = 1387.4695995615057, cumulative cost = {4.301209596734513E10 rows,
0.0 cpu, 0.0 io}, id = 4853
              HiveJoinRel(condition=[=($12, $0)], joinType=[inner]): rowcount = 1387.4695995615057,
cumulative cost = {4.301209596734513E10 rows, 0.0 cpu, 0.0 io}, id = 4851
                HiveProjectRel(d_date_sk=[$0], d_date=[$2]): rowcount = 73049.0, cumulative
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4368
                  HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]): rowcount
= 73049.0, cumulative cost = {0}, id = 3975
                HiveJoinRel(condition=[AND(=($3, $21), =($17, $22))], joinType=[inner]): rowcount
= 1387.4695995615057, cumulative cost = {4.3012021530875534E10 rows, 0.0 cpu, 0.0 io}, id
= 4849
                  HiveJoinRel(condition=[=($13, $0)], joinType=[inner]): rowcount = 142308.57000041206,
cumulative cost = {4.3011806173305534E10 rows, 0.0 cpu, 0.0 io}, id = 4847
                    HiveJoinRel(condition=[=($4, $1)], joinType=[inner]): rowcount = 2355798.8422575975,
cumulative cost = {2355825.8422575975 rows, 0.0 cpu, 0.0 io}, id = 4460
                      HiveProjectRel(inv_item_sk=[$0], inv_warehouse_sk=[$1], inv_quantity_on_hand=[$2],
inv_date_sk=[$3]): rowcount = 2355798.8422575975, cumulative cost = {0.0 rows, 0.0 cpu, 0.0
io}, id = 4334
                        HiveFilterRel(condition=[=($2, 528)]): rowcount = 2355798.8422575975,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4332
                          HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.inventory]]):
rowcount = 1.627857E9, cumulative cost = {0}, id = 3979
                      HiveProjectRel(w_warehouse_sk=[$0], w_warehouse_name=[$2]): rowcount
= 27.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4338
                        HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.warehouse]]):
rowcount = 27.0, cumulative cost = {0}, id = 3982
                    HiveJoinRel(condition=[=($0, $7)], joinType=[inner]): rowcount = 19590.519781796647,
cumulative cost = {4.30094307839435E10 rows, 0.0 cpu, 0.0 io}, id = 4845
                      HiveProjectRel(i_item_sk=[$0], i_item_desc=[$4]): rowcount = 462000.0,
cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4342
                        HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.item]]):
rowcount = 462000.0, cumulative cost = {0}, id = 3980
                      HiveJoinRel(condition=[=($3, $0)], joinType=[inner]): rowcount = 19590.519781796647,
cumulative cost = {4.300894919342371E10 rows, 0.0 cpu, 0.0 io}, id = 4843
                        HiveProjectRel(cd_demo_sk=[$0], cd_marital_status=[$2]): rowcount
= 274400.0, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4348
                          HiveFilterRel(condition=[=($2, 'M')]): rowcount = 274400.0, cumulative
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4346
                            HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.customer_demographics]]):
rowcount = 1920800.0, cumulative cost = {0}, id = 3981
                        HiveJoinRel(condition=[=($2, $9)], joinType=[inner]): rowcount = 137133.63847257654,
cumulative cost = {4.300853765978524E10 rows, 0.0 cpu, 0.0 io}, id = 4841
                          HiveJoinRel(condition=[=($4, $5)], joinType=[inner]): rowcount =
3428340.961814413, cumulative cost = {4.3005109030823425E10 rows, 0.0 cpu, 0.0 io}, id = 4428
                            HiveProjectRel(cs_ship_date_sk=[$1], cs_bill_cdemo_sk=[$3], cs_bill_hdemo_sk=[$4],
cs_item_sk=[$14], cs_sold_date_sk=[$33]): rowcount = 4.3005109025E10, cumulative cost = {0.0
rows, 0.0 cpu, 0.0 io}, id = 4330
                              HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.catalog_sales]]):
rowcount = 4.3005109025E10, cumulative cost = {0}, id = 3977
                            HiveProjectRel(d_date_sk=[$0], d_date=[$2], d_week_seq=[$4], d_year=[$6]):
rowcount = 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4360
                              HiveFilterRel(condition=[AND(=($6, 2001), =($6, 2001))]): rowcount
= 5.823421556122448, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4358
                                HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]):
rowcount = 73049.0, cumulative cost = {0}, id = 3975
                          HiveProjectRel(hd_demo_sk=[$0], hd_buy_potential=[$2]): rowcount
= 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4354
                            HiveFilterRel(condition=[AND(=($2, '1001-5000'), =($2, '1001-5000'))]):
rowcount = 288.00000000000006, cumulative cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4352
                              HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.household_demographics]]):
rowcount = 7200.0, cumulative cost = {0}, id = 3974
                  HiveProjectRel(d_date_sk=[$0], d_week_seq=[$4]): rowcount = 73049.0, cumulative
cost = {0.0 rows, 0.0 cpu, 0.0 io}, id = 4364
                    HiveTableScanRel(table=[[tpcds_bin_partitioned_orc_30000.date_dim]]):
rowcount = 73049.0, cumulative cost = {0}, id = 3975
{code}

Q72
{code}
select  i_item_desc
      ,w_warehouse_name
      ,d1.d_week_seq
      ,count(case when p_promo_sk is null then 1 else 0 end) no_promo
      ,count(case when p_promo_sk is not null then 1 else 0 end) promo
      ,count(*) total_cnt
from catalog_sales
join inventory on (catalog_sales.cs_item_sk = inventory.inv_item_sk)
join warehouse on (warehouse.w_warehouse_sk=inventory.inv_warehouse_sk)
join item on (item.i_item_sk = catalog_sales.cs_item_sk)
join customer_demographics on (catalog_sales.cs_bill_cdemo_sk = customer_demographics.cd_demo_sk)
join household_demographics on (catalog_sales.cs_bill_hdemo_sk = household_demographics.hd_demo_sk)
join date_dim d1 on (catalog_sales.cs_sold_date_sk = d1.d_date_sk)
join date_dim d2 on (inventory.inv_date_sk = d2.d_date_sk)
join date_dim d3 on (catalog_sales.cs_ship_date_sk = d3.d_date_sk)
left outer join promotion on (catalog_sales.cs_promo_sk=promotion.p_promo_sk)
left outer join catalog_returns on (catalog_returns.cr_item_sk = catalog_sales.cs_item_sk
and catalog_returns.cr_order_number = catalog_sales.cs_order_number)
where d1.d_week_seq = d2.d_week_seq
  and inv_quantity_on_hand < cs_quantity 
  and d3.d_date > d1.d_date + 5
  and hd_buy_potential = '1001-5000'
  and d1.d_year = 2001
  and hd_buy_potential = '1001-5000'
  and cd_marital_status = 'M'
  and d1.d_year = 2001
group by i_item_desc,w_warehouse_name,d1.d_week_seq
order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq
limit 100
{code}

Q64
{code}
select i_product_name as product_name ,i_item_sk as item_sk ,s_store_name as store_name
     ,s_zip as store_zip ,ad1.ca_street_number as b_street_number ,ad1.ca_street_name as b_streen_name
     ,ad1.ca_city as b_city ,ad1.ca_zip as b_zip ,ad2.ca_street_number as c_street_number
     ,ad2.ca_street_name as c_street_name ,ad2.ca_city as c_city ,ad2.ca_zip as c_zip
     ,d1.d_year as syear ,d2.d_year as fsyear ,d3.d_year as s2year ,count(*) as cnt
     ,sum(ss_wholesale_cost) as s1 ,sum(ss_list_price) as s2 ,sum(ss_coupon_amt) as s3
  FROM   store_sales
        JOIN store_returns ON store_sales.ss_item_sk = store_returns.sr_item_sk and store_sales.ss_ticket_number
= store_returns.sr_ticket_number
        JOIN customer ON store_sales.ss_customer_sk = customer.c_customer_sk
        JOIN date_dim d1 ON store_sales.ss_sold_date_sk = d1.d_date_sk
        JOIN date_dim d2 ON customer.c_first_sales_date_sk = d2.d_date_sk 
        JOIN date_dim d3 ON customer.c_first_shipto_date_sk = d3.d_date_sk
        JOIN store ON store_sales.ss_store_sk = store.s_store_sk
        JOIN customer_demographics cd1 ON store_sales.ss_cdemo_sk= cd1.cd_demo_sk
        JOIN customer_demographics cd2 ON customer.c_current_cdemo_sk = cd2.cd_demo_sk
        JOIN promotion ON store_sales.ss_promo_sk = promotion.p_promo_sk
        JOIN household_demographics hd1 ON store_sales.ss_hdemo_sk = hd1.hd_demo_sk
        JOIN household_demographics hd2 ON customer.c_current_hdemo_sk = hd2.hd_demo_sk
        JOIN customer_address ad1 ON store_sales.ss_addr_sk = ad1.ca_address_sk
        JOIN customer_address ad2 ON customer.c_current_addr_sk = ad2.ca_address_sk
        JOIN income_band ib1 ON hd1.hd_income_band_sk = ib1.ib_income_band_sk
        JOIN income_band ib2 ON hd2.hd_income_band_sk = ib2.ib_income_band_sk
        JOIN item ON store_sales.ss_item_sk = item.i_item_sk
        JOIN
 (select cs_item_sk
        ,sum(cs_ext_list_price) as sale,sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit)
as refund
  from catalog_sales JOIN catalog_returns
  ON catalog_sales.cs_item_sk = catalog_returns.cr_item_sk
    and catalog_sales.cs_order_number = catalog_returns.cr_order_number
  group by cs_item_sk
  having sum(cs_ext_list_price)>2*sum(cr_refunded_cash+cr_reversed_charge+cr_store_credit))
cs_ui
ON store_sales.ss_item_sk = cs_ui.cs_item_sk
  WHERE  
         cd1.cd_marital_status <> cd2.cd_marital_status and
         i_color in ('maroon','burnished','dim','steel','navajo','chocolate') and
         i_current_price between 35 and 35 + 10 and
         i_current_price between 35 + 1 and 35 + 15
group by i_product_name ,i_item_sk ,s_store_name ,s_zip ,ad1.ca_street_number
       ,ad1.ca_street_name ,ad1.ca_city ,ad1.ca_zip ,ad2.ca_street_number
       ,ad2.ca_street_name ,ad2.ca_city ,ad2.ca_zip ,d1.d_year ,d2.d_year ,d3.d_year
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message