hive-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Hari Sankar Sivarama Subramaniyan (JIRA)" <j...@apache.org>
Subject [jira] [Created] (HIVE-12647) hive.mapred.mode=strict throws an error even if the final plan does not have cartesian product in it.
Date Thu, 10 Dec 2015 21:24:10 GMT
Hari Sankar Sivarama Subramaniyan created HIVE-12647:
--------------------------------------------------------

             Summary: hive.mapred.mode=strict throws an error even if the final plan does
not have cartesian product in it.
                 Key: HIVE-12647
                 URL: https://issues.apache.org/jira/browse/HIVE-12647
             Project: Hive
          Issue Type: Bug
            Reporter: Hari Sankar Sivarama Subramaniyan


{code}
Vertex dependency in root stage
Reducer 10 <- Reducer 9 (SIMPLE_EDGE)
Reducer 2 <- Map 1 (SIMPLE_EDGE), Map 11 (SIMPLE_EDGE)
Reducer 3 <- Map 12 (SIMPLE_EDGE), Reducer 2 (SIMPLE_EDGE)
Reducer 4 <- Map 13 (SIMPLE_EDGE), Reducer 3 (SIMPLE_EDGE)
Reducer 5 <- Map 14 (SIMPLE_EDGE), Reducer 4 (SIMPLE_EDGE)
Reducer 6 <- Map 15 (SIMPLE_EDGE), Reducer 5 (SIMPLE_EDGE)
Reducer 7 <- Map 16 (SIMPLE_EDGE), Reducer 6 (SIMPLE_EDGE)
Reducer 8 <- Map 17 (SIMPLE_EDGE), Reducer 7 (SIMPLE_EDGE)
Reducer 9 <- Reducer 8 (SIMPLE_EDGE)

Stage-0
   Fetch Operator
      limit:100
      Stage-1
         Reducer 10
         File Output Operator [FS_63]
            compressed:false
            Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE Column stats:
NONE
            table:{"input format:":"org.apache.hadoop.mapred.TextInputFormat","output format:":"org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat","serde:":"org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"}
            Limit [LIM_62]
               Number of rows:100
               Statistics:Num rows: 100 Data size: 143600 Basic stats: COMPLETE Column stats:
NONE
               Select Operator [SEL_61]
               |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11","_col12","_col13","_col14"]
               |  Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE Column
stats: NONE
               |<-Reducer 9 [SIMPLE_EDGE]
                  Reduce Output Operator [RS_60]
                     key expressions:_col0 (type: string), _col1 (type: string), _col2 (type:
string)
                     sort order:+++
                     Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE
Column stats: NONE
                     value expressions:_col3 (type: bigint), _col4 (type: double), _col5 (type:
double), _col6 (type: double), _col7 (type: bigint), _col8 (type: double), _col9 (type: double),
_col10 (type: double), _col11 (type: bigint), _col12 (type: double), _col13 (type: double)
                     Select Operator [SEL_58]
                        outputColumnNames:["_col0","_col1","_col10","_col11","_col12","_col13","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9"]
                        Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE
Column stats: NONE
                        Group By Operator [GBY_57]
                        |  aggregations:["count(VALUE._col0)","avg(VALUE._col1)","stddev_samp(VALUE._col2)","count(VALUE._col3)","avg(VALUE._col4)","stddev_samp(VALUE._col5)","count(VALUE._col6)","avg(VALUE._col7)","stddev_samp(VALUE._col8)"]
                        |  keys:KEY._col0 (type: string), KEY._col1 (type: string), KEY._col2
(type: string)
                        |  outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
                        |  Statistics:Num rows: 127050 Data size: 182479129 Basic stats: COMPLETE
Column stats: NONE
                        |<-Reducer 8 [SIMPLE_EDGE]
                           Reduce Output Operator [RS_56]
                              key expressions:_col0 (type: string), _col1 (type: string),
_col2 (type: string)
                              Map-reduce partition columns:_col0 (type: string), _col1 (type:
string), _col2 (type: string)
                              sort order:+++
                              Statistics:Num rows: 254100 Data size: 364958258 Basic stats:
COMPLETE Column stats: NONE
                              value expressions:_col3 (type: bigint), _col4 (type: struct<count:bigint,sum:double,input:int>),
_col5 (type: struct<count:bigint,sum:double,variance:double>), _col6 (type: bigint),
_col7 (type: struct<count:bigint,sum:double,input:int>), _col8 (type: struct<count:bigint,sum:double,variance:double>),
_col9 (type: bigint), _col10 (type: struct<count:bigint,sum:double,input:int>), _col11
(type: struct<count:bigint,sum:double,variance:double>)
                              Group By Operator [GBY_55]
                                 aggregations:["count(_col5)","avg(_col5)","stddev_samp(_col5)","count(_col10)","avg(_col10)","stddev_samp(_col10)","count(_col14)","avg(_col14)","stddev_samp(_col14)"]
                                 keys:_col22 (type: string), _col24 (type: string), _col25
(type: string)
                                 outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5","_col6","_col7","_col8","_col9","_col10","_col11"]
                                 Statistics:Num rows: 254100 Data size: 364958258 Basic stats:
COMPLETE Column stats: NONE
                                 Select Operator [SEL_54]
                                    outputColumnNames:["_col22","_col24","_col25","_col5","_col10","_col14"]
                                    Statistics:Num rows: 254100 Data size: 364958258 Basic
stats: COMPLETE Column stats: NONE
                                    Merge Join Operator [MERGEJOIN_113]
                                    |  condition map:[{"":"Inner Join 0 to 1"}]
                                    |  keys:{"0":"_col1 (type: int)","1":"_col0 (type: int)"}
                                    |  outputColumnNames:["_col5","_col10","_col14","_col22","_col24","_col25"]
                                    |  Statistics:Num rows: 254100 Data size: 364958258 Basic
stats: COMPLETE Column stats: NONE
                                    |<-Map 17 [SIMPLE_EDGE]
                                    |  Reduce Output Operator [RS_52]
                                    |     key expressions:_col0 (type: int)
                                    |     Map-reduce partition columns:_col0 (type: int)
                                    |     sort order:+
                                    |     Statistics:Num rows: 231000 Data size: 331780228
Basic stats: COMPLETE Column stats: NONE
                                    |     value expressions:_col1 (type: string), _col2 (type:
string)
                                    |     Select Operator [SEL_18]
                                    |        outputColumnNames:["_col0","_col1","_col2"]
                                    |        Statistics:Num rows: 231000 Data size: 331780228
Basic stats: COMPLETE Column stats: NONE
                                    |        Filter Operator [FIL_106]
                                    |           predicate:i_item_sk is not null (type: boolean)
                                    |           Statistics:Num rows: 231000 Data size: 331780228
Basic stats: COMPLETE Column stats: NONE
                                    |           TableScan [TS_17]
                                    |              alias:item
                                    |              Statistics:Num rows: 462000 Data size:
663560457 Basic stats: COMPLETE Column stats: NONE
                                    |<-Reducer 7 [SIMPLE_EDGE]
                                       Reduce Output Operator [RS_50]
                                          key expressions:_col1 (type: int)
                                          Map-reduce partition columns:_col1 (type: int)
                                          sort order:+
                                          Statistics:Num rows: 26735 Data size: 29919145 Basic
stats: COMPLETE Column stats: NONE
                                          value expressions:_col5 (type: int), _col10 (type:
int), _col14 (type: int), _col22 (type: string)
                                          Merge Join Operator [MERGEJOIN_112]
                                          |  condition map:[{"":"Inner Join 0 to 1"}]
                                          |  keys:{"0":"_col3 (type: int)","1":"_col0 (type:
int)"}
                                          |  outputColumnNames:["_col1","_col5","_col10","_col14","_col22"]
                                          |  Statistics:Num rows: 26735 Data size: 29919145
Basic stats: COMPLETE Column stats: NONE
                                          |<-Map 16 [SIMPLE_EDGE]
                                          |  Reduce Output Operator [RS_47]
                                          |     key expressions:_col0 (type: int)
                                          |     Map-reduce partition columns:_col0 (type:
int)
                                          |     sort order:+
                                          |     Statistics:Num rows: 852 Data size: 1628138
Basic stats: COMPLETE Column stats: NONE
                                          |     value expressions:_col1 (type: string)
                                          |     Select Operator [SEL_16]
                                          |        outputColumnNames:["_col0","_col1"]
                                          |        Statistics:Num rows: 852 Data size: 1628138
Basic stats: COMPLETE Column stats: NONE
                                          |        Filter Operator [FIL_105]
                                          |           predicate:s_store_sk is not null (type:
boolean)
                                          |           Statistics:Num rows: 852 Data size:
1628138 Basic stats: COMPLETE Column stats: NONE
                                          |           TableScan [TS_15]
                                          |              alias:store
                                          |              Statistics:Num rows: 1704 Data size:
3256276 Basic stats: COMPLETE Column stats: NONE
                                          |<-Reducer 6 [SIMPLE_EDGE]
                                             Reduce Output Operator [RS_45]
                                                key expressions:_col3 (type: int)
                                                Map-reduce partition columns:_col3 (type:
int)
                                                sort order:+
                                                Statistics:Num rows: 24305 Data size: 27199223
Basic stats: COMPLETE Column stats: NONE
                                                value expressions:_col1 (type: int), _col5
(type: int), _col10 (type: int), _col14 (type: int)
                                                Merge Join Operator [MERGEJOIN_111]
                                                |  condition map:[{"":"Inner Join 0 to 1"}]
                                                |  keys:{"0":"_col11 (type: int)","1":"_col0
(type: int)"}
                                                |  outputColumnNames:["_col1","_col3","_col5","_col10","_col14"]
                                                |  Statistics:Num rows: 24305 Data size: 27199223
Basic stats: COMPLETE Column stats: NONE
                                                |<-Map 15 [SIMPLE_EDGE]
                                                |  Reduce Output Operator [RS_42]
                                                |     key expressions:_col0 (type: int)
                                                |     Map-reduce partition columns:_col0 (type:
int)
                                                |     sort order:+
                                                |     Statistics:Num rows: 18262 Data size:
20435178 Basic stats: COMPLETE Column stats: NONE
                                                |     Select Operator [SEL_14]
                                                |        outputColumnNames:["_col0"]
                                                |        Statistics:Num rows: 18262 Data size:
20435178 Basic stats: COMPLETE Column stats: NONE
                                                |        Filter Operator [FIL_104]
                                                |           predicate:((d_quarter_name) IN
('2000Q1', '2000Q2', '2000Q3') and d_date_sk is not null) (type: boolean)
                                                |           Statistics:Num rows: 18262 Data
size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                |           TableScan [TS_12]
                                                |              alias:d1
                                                |              Statistics:Num rows: 73049
Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                                                |<-Reducer 5 [SIMPLE_EDGE]
                                                   Reduce Output Operator [RS_40]
                                                      key expressions:_col11 (type: int)
                                                      Map-reduce partition columns:_col11
(type: int)
                                                      sort order:+
                                                      Statistics:Num rows: 22096 Data size:
24726566 Basic stats: COMPLETE Column stats: NONE
                                                      value expressions:_col1 (type: int),
_col3 (type: int), _col5 (type: int), _col10 (type: int), _col14 (type: int)
                                                      Merge Join Operator [MERGEJOIN_110]
                                                      |  condition map:[{"":"Inner Join 0
to 1"}]
                                                      |  keys:{"0":"_col6 (type: int)","1":"_col0
(type: int)"}
                                                      |  outputColumnNames:["_col1","_col3","_col5","_col10","_col11","_col14"]
                                                      |  Statistics:Num rows: 22096 Data size:
24726566 Basic stats: COMPLETE Column stats: NONE
                                                      |<-Map 14 [SIMPLE_EDGE]
                                                      |  Reduce Output Operator [RS_37]
                                                      |     key expressions:_col0 (type: int)
                                                      |     Map-reduce partition columns:_col0
(type: int)
                                                      |     sort order:+
                                                      |     Statistics:Num rows: 18262 Data
size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                      |     Select Operator [SEL_11]
                                                      |        outputColumnNames:["_col0"]
                                                      |        Statistics:Num rows: 18262
Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                      |        Filter Operator [FIL_103]
                                                      |           predicate:((d_quarter_name)
IN ('2000Q1', '2000Q2', '2000Q3') and d_date_sk is not null) (type: boolean)
                                                      |           Statistics:Num rows: 18262
Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                      |           TableScan [TS_9]
                                                      |              alias:d1
                                                      |              Statistics:Num rows:
73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                                                      |<-Reducer 4 [SIMPLE_EDGE]
                                                         Reduce Output Operator [RS_35]
                                                            key expressions:_col6 (type: int)
                                                            Map-reduce partition columns:_col6
(type: int)
                                                            sort order:+
                                                            Statistics:Num rows: 20088 Data
size: 22478696 Basic stats: COMPLETE Column stats: NONE
                                                            value expressions:_col1 (type:
int), _col3 (type: int), _col5 (type: int), _col10 (type: int), _col11 (type: int), _col14
(type: int)
                                                            Merge Join Operator [MERGEJOIN_109]
                                                            |  condition map:[{"":"Inner Join
0 to 1"}]
                                                            |  keys:{"0":"_col0 (type: int)","1":"_col0
(type: int)"}
                                                            |  outputColumnNames:["_col1","_col3","_col5","_col6","_col10","_col11","_col14"]
                                                            |  Statistics:Num rows: 20088
Data size: 22478696 Basic stats: COMPLETE Column stats: NONE
                                                            |<-Map 13 [SIMPLE_EDGE]
                                                            |  Reduce Output Operator [RS_32]
                                                            |     key expressions:_col0 (type:
int)
                                                            |     Map-reduce partition columns:_col0
(type: int)
                                                            |     sort order:+
                                                            |     Statistics:Num rows: 18262
Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                            |     Select Operator [SEL_8]
                                                            |        outputColumnNames:["_col0"]
                                                            |        Statistics:Num rows:
18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                            |        Filter Operator [FIL_102]
                                                            |           predicate:((d_quarter_name
= '2000Q1') and d_date_sk is not null) (type: boolean)
                                                            |           Statistics:Num rows:
18262 Data size: 20435178 Basic stats: COMPLETE Column stats: NONE
                                                            |           TableScan [TS_6]
                                                            |              alias:d1
                                                            |              Statistics:Num
rows: 73049 Data size: 81741831 Basic stats: COMPLETE Column stats: NONE
                                                            |<-Reducer 3 [SIMPLE_EDGE]
                                                               Reduce Output Operator [RS_30]
                                                                  key expressions:_col0 (type:
int)
                                                                  Map-reduce partition columns:_col0
(type: int)
                                                                  sort order:+
                                                                  Statistics:Num rows: 1 Data
size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  value expressions:_col1
(type: int), _col3 (type: int), _col5 (type: int), _col6 (type: int), _col10 (type: int),
_col11 (type: int), _col14 (type: int)
                                                                  Merge Join Operator [MERGEJOIN_108]
                                                                  |  condition map:[{"":"Inner
Join 0 to 1"}]
                                                                  |  keys:{"0":"_col8 (type:
int), _col7 (type: int)","1":"_col1 (type: int), _col2 (type: int)"}
                                                                  |  outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col10","_col11","_col14"]
                                                                  |  Statistics:Num rows:
1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |<-Map 12 [SIMPLE_EDGE]
                                                                  |  Reduce Output Operator
[RS_27]
                                                                  |     key expressions:_col1
(type: int), _col2 (type: int)
                                                                  |     Map-reduce partition
columns:_col1 (type: int), _col2 (type: int)
                                                                  |     sort order:++
                                                                  |     Statistics:Num rows:
1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |     value expressions:_col0
(type: int), _col3 (type: int)
                                                                  |     Select Operator [SEL_5]
                                                                  |        outputColumnNames:["_col0","_col1","_col2","_col3"]
                                                                  |        Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |        Filter Operator
[FIL_101]
                                                                  |           predicate:((cs_bill_customer_sk
is not null and cs_item_sk is not null) and cs_sold_date_sk is not null) (type: boolean)
                                                                  |           Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |           TableScan [TS_4]
                                                                  |              alias:catalog_sales
                                                                  |              Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                  |<-Reducer 2 [SIMPLE_EDGE]
                                                                     Reduce Output Operator
[RS_25]
                                                                        key expressions:_col8
(type: int), _col7 (type: int)
                                                                        Map-reduce partition
columns:_col8 (type: int), _col7 (type: int)
                                                                        sort order:++
                                                                        Statistics:Num rows:
1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        value expressions:_col0
(type: int), _col1 (type: int), _col3 (type: int), _col5 (type: int), _col6 (type: int), _col10
(type: int)
                                                                        Merge Join Operator
[MERGEJOIN_107]
                                                                        |  condition map:[{"":"Inner
Join 0 to 1"}]
                                                                        |  keys:{"0":"_col2
(type: int), _col1 (type: int), _col4 (type: int)","1":"_col2 (type: int), _col1 (type: int),
_col3 (type: int)"}
                                                                        |  outputColumnNames:["_col0","_col1","_col3","_col5","_col6","_col7","_col8","_col10"]
                                                                        |  Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |<-Map 1 [SIMPLE_EDGE]
                                                                        |  Reduce Output Operator
[RS_20]
                                                                        |     key expressions:_col2
(type: int), _col1 (type: int), _col4 (type: int)
                                                                        |     Map-reduce partition
columns:_col2 (type: int), _col1 (type: int), _col4 (type: int)
                                                                        |     sort order:+++
                                                                        |     Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |     value expressions:_col0
(type: int), _col3 (type: int), _col5 (type: int)
                                                                        |     Select Operator
[SEL_1]
                                                                        |        outputColumnNames:["_col0","_col1","_col2","_col3","_col4","_col5"]
                                                                        |        Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |        Filter Operator
[FIL_99]
                                                                        |           predicate:((((ss_customer_sk
is not null and ss_item_sk is not null) and ss_ticket_number is not null) and ss_sold_date_sk
is not null) and ss_store_sk is not null) (type: boolean)
                                                                        |           Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |           TableScan
[TS_0]
                                                                        |              alias:store_sales
                                                                        |              Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                        |<-Map 11 [SIMPLE_EDGE]
                                                                           Reduce Output Operator
[RS_22]
                                                                              key expressions:_col2
(type: int), _col1 (type: int), _col3 (type: int)
                                                                              Map-reduce partition
columns:_col2 (type: int), _col1 (type: int), _col3 (type: int)
                                                                              sort order:+++
                                                                              Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                              value expressions:_col0
(type: int), _col4 (type: int)
                                                                              Select Operator
[SEL_3]
                                                                                 outputColumnNames:["_col0","_col1","_col2","_col3","_col4"]
                                                                                 Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                                 Filter Operator
[FIL_100]
                                                                                    predicate:(((sr_customer_sk
is not null and sr_item_sk is not null) and sr_ticket_number is not null) and sr_returned_date_sk
is not null) (type: boolean)
                                                                                    Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
                                                                                    TableScan
[TS_2]
                                                                                       alias:store_returns
                                                                                       Statistics:Num
rows: 1 Data size: 0 Basic stats: PARTIAL Column stats: NONE
{code}

The query is :
{code}
 explain select i_item_desc ,i_category ,i_class ,i_current_price ,i_item_id ,sum(ws_ext_sales_price)
as itemrevenue ,sum(ws_ext_sales_price)*100/sum(sum(ws_ext_sales_price)) over (partition by
i_class) as revenueratio from web_sales ,item ,date_dim where web_sales.ws_item_sk = item.i_item_sk
and item.i_category in ('Jewelry', 'Sports', 'Books') and web_sales.ws_sold_date_sk = date_dim.d_date_sk
and date_dim.d_date between '2001-01-12' and '2001-02-11' group by i_item_id ,i_item_desc
,i_category ,i_class ,i_current_price order by i_category ,i_class ,i_item_id ,i_item_desc
,revenueratio limit 100;
{code}

It seems that in SemanticAnalyzer.genJoinReduceSinkChild() we look for Join predicates only
in 'ON' clause. If the join condition happens in 'WHERE' clause of the query, we aggressively
throw an exception assuming this join is a cartesian product in strict mode. We should delay
this check post physical optimizer until the plan is complete.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message