hive-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jason Dere (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (HIVE-12535) Dynamic Hash Join: Key references are cyclic
Date Sun, 29 Nov 2015 23:31:10 GMT

    [ https://issues.apache.org/jira/browse/HIVE-12535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15031217#comment-15031217
] 

Jason Dere commented on HIVE-12535:
-----------------------------------

It looks like that output is due to the user-level explain formatting that is done in common/src/java/org/apache/hadoop/hive/common/jsonexplain/tez/Op.java.


After the initial plan is created the MapJoin operator looks like this:

{noformat}
"keys:":{"0":"KEY.reducesinkkey0 (type: int)","1":"KEY.reducesinkkey0 (type: int)"}
"input vertices:":{"1":"Map 3"}
{noformat}

Because input "0" (which I think is the big table in this case) is not in the "input vertices"
list, it gets resolved during Op.java as the current vertex ("Reducer 2"). 

So this issue in the explain output is simply cosmetic, but if there is a similar issue in
the vectorizer, it could also be related to the fact that the input vertices for the MapJoin
do not include the big table. Tho I'm not sure if whether that mapping is supposed to include
the big table, someone else may need to comment on that.

> Dynamic Hash Join: Key references are cyclic
> --------------------------------------------
>
>                 Key: HIVE-12535
>                 URL: https://issues.apache.org/jira/browse/HIVE-12535
>             Project: Hive
>          Issue Type: Bug
>          Components: Query Planning
>    Affects Versions: 2.0.0
>            Reporter: Gopal V
>            Assignee: Jason Dere
>         Attachments: philz_26.txt
>
>
> MAPJOIN_4227 is inside "Reducer 2", but refers back to "Reducer 2" in its keys. It should
say "Map 1" there.
> {code}
> |                |<-Reducer 2 [SIMPLE_EDGE] vectorized, llap                     
                                                                                         
                                                                                        |
> |                   Reduce Output Operator [RS_4189]                                
                                                                                         
                                                                                     |
> |                      key expressions:_col0 (type: string), _col1 (type: int)      
                                                                                         
                                                                                     |
> |                      Map-reduce partition columns:_col0 (type: string), _col1 (type:
int)                                                                                     
                                                                                   |
> |                      sort order:++                                                
                                                                                         
                                                                                     |
> |                      Statistics:Num rows: 83 Data size: 9213 Basic stats: COMPLETE
Column stats: COMPLETE                                                                   
                                                                                     |
> |                      value expressions:_col2 (type: double)                       
                                                                                         
                                                                                     |
> |                      Group By Operator [OP_4229]                                  
                                                                                         
                                                                                     |
> |                         aggregations:["sum(_col2)"]                               
                                                                                         
                                                                                     |
> |                         keys:_col0 (type: string), _col1 (type: int)              
                                                                                         
                                                                                     |
> |                         outputColumnNames:["_col0","_col1","_col2"]               
                                                                                         
                                                                                     |
> |                         Statistics:Num rows: 83 Data size: 9213 Basic stats: COMPLETE
Column stats: COMPLETE                                                                   
                                                                                  |
> |                         Select Operator [OP_4228]                                 
                                                                                         
                                                                                     |
> |                            outputColumnNames:["_col0","_col1","_col2"]            
                                                                                         
                                                                                     |
> |                            Statistics:Num rows: 166 Data size: 26394 Basic stats: COMPLETE
Column stats: COMPLETE                                                                   
                                                                             |
> |                            Map Join Operator [MAPJOIN_4227]                       
                                                                                         
                                                                                     |
> |                            |  condition map:[{"":"Inner Join 0 to 1"}]            
                                                                                         
                                                                                     |
> |                            |  keys:{"Reducer 2":"KEY.reducesinkkey0 (type: bigint),
KEY.reducesinkkey1 (type: int), KEY.reducesinkkey2 (type: int)","Map 5":"KEY.reducesinkkey0
(type: bigint), KEY.reducesinkkey1 (type: int), KEY.reducesinkkey2 (type: int)"}  |
> |                            |  outputColumnNames:["_col1","_col3","_col5"]         
                                                                                         
                                                                                     |
> |                            |  Statistics:Num rows: 166 Data size: 26394 Basic stats:
COMPLETE Column stats: COMPLETE                                                          
                                                                                   |
> |                            |<-Map 5 [CUSTOM_SIMPLE_EDGE] vectorized, llap      
                                                                                         
                                                                                        |
> |                            |  Reduce Output Operator [RS_4226]                    
                                                                                         
                                                                                     |
> |                            |     key expressions:_col1 (type: bigint), year(_col2)
(type: int), month(_col2) (type: int)                                                    
                                                                                     |
> |                            |     Map-reduce partition columns:_col1 (type: bigint),
year(_col2) (type: int), month(_col2) (type: int)                                        
                                                                                    |
> |                            |     sort order:+++                                   
                                                                                         
                                                                                     |
> |                            |     Statistics:Num rows: 74973886 Data size: 5098224248
Basic stats: COMPLETE Column stats: COMPLETE                                             
                                                                                   |
> |                            |     value expressions:_col0 (type: float), _col2 (type:
date)                                                                                    
                                                                                   |
> |                            |     Select Operator [OP_4225]                        
                                                                                         
                                                                                     |
> |                            |        outputColumnNames:["_col0","_col1","_col2"]   
                                                                                         
                                                                                     |
> |                            |        Statistics:Num rows: 74973886 Data size: 5098224248
Basic stats: COMPLETE Column stats: COMPLETE                                             
                                                                                |
> |                            |        Filter Operator [FIL_4224]                    
                                                                                         
                                                                                     |
> |                            |           predicate:((account_id is not null and month(effective_date)
BETWEEN 4 AND 7) and month(effective_date) is not null) (type: boolean)                  
                                                                    |
> |                            |           Statistics:Num rows: 74973886 Data size: 5098224248
Basic stats: COMPLETE Column stats: COMPLETE                                             
                                                                             |
> |                            |           TableScan [TS_4171]                        
                                                                                         
                                                                                     |
> |                            |              alias:t                                 
                                                                                         
                                                                                     |
> |                            |              Statistics:Num rows: 149947772 Data size:
10196448496 Basic stats: COMPLETE Column stats: COMPLETE                                 
                                                                                    |
> |                            |<-Map 1 [CUSTOM_SIMPLE_EDGE] vectorized, llap      
                                                                                         
                                                                                        |
> |                               Reduce Output Operator [RS_4223]                    
                                                                                         
                                                                                     |
> |                                  key expressions:_col0 (type: bigint), year(_col2)
(type: int), month(_col2) (type: int)                                                    
                                                                                     |
> |                                  Map-reduce partition columns:_col0 (type: bigint),
year(_col2) (type: int), month(_col2) (type: int)                                        
                                                                                    |
> |                                  sort order:+++                                   
                                                                                         
                                                                                     |
> |                                  Statistics:Num rows: 50289673 Data size: 8197216699
Basic stats: COMPLETE Column stats: COMPLETE                                             
                                                                                   |
> |                                  value expressions:_col1 (type: string)           
                                                                                         
                                                                                     |
> |                                  Map Join Operator [MAPJOIN_4222]                 
                                                                                         
                                                                                     |
> |                                  |  condition map:[{"":"Left Semi Join 0 to 1"}]  
                                                                                         
                                                                                     |
> |                                  |  keys:{"Map 1":"_col1 (type: string)","Map 4":"_col0
(type: string)"}                                                                         
                                                                                |
> |                                  |  outputColumnNames:["_col0","_col1","_col2"]   
                                                                                         
                                                                                     |
> |                                  |  Statistics:Num rows: 50289673 Data size: 8197216699
Basic stats: COMPLETE Column stats: COMPLETE                                             
                                                                                |
> |                                  |<-Map 4 [BROADCAST_EDGE] vectorized, llap    
                                                                                         
                                                                                        |
> |                                  |  Reduce Output Operator [RS_4179]              
                                                                                         
                                                                                     |
> |                                  |     key expressions:_col0 (type: string)       
                                                                                         
                                                                                     |
> |                                  |     Map-reduce partition columns:_col0 (type: string)
                                                                                         
                                                                               |
> |                                  |     sort order:+                               
                                                                                         
                                                                                     |
> |                                  |     Statistics:Num rows: 1 Data size: 99 Basic stats:
COMPLETE Column stats: COMPLETE                                                          
                                                                               |
> |                                  |     Group By Operator [OP_4219]                
                                                                                         
                                                                                     |
> |                                  |        keys:_col0 (type: string)               
                                                                                         
                                                                                     |
> |                                  |        outputColumnNames:["_col0"]             
                                                                                         
                                                                                     |
> |                                  |        Statistics:Num rows: 1 Data size: 99 Basic
stats: COMPLETE Column stats: COMPLETE                                                   
                                                                                   |
> |                                  |        Select Operator [OP_4218]               
                                                                                         
                                                                                     |
> |                                  |           outputColumnNames:["_col0"]          
                                                                                         
                                                                                     |
> |                                  |           Statistics:Num rows: 3 Data size: 297
Basic stats: COMPLETE Column stats: COMPLETE                                             
                                                                                     |
> |                                  |           Filter Operator [FIL_4217]           
                                                                                         
                                                                                     |
> |                                  |              predicate:(account_type = 'order ahead')
(type: boolean)                                                                          
                                                                           |
> |                                  |              Statistics:Num rows: 3 Data size: 294
Basic stats: COMPLETE Column stats: COMPLETE                                             
                                                                                  |
> |                                  |              TableScan [TS_4168]               
                                                                                         
                                                                                     |
> |                                  |                 alias:at                       
                                                                                         
                                                                                     |
> |                                  |                 Statistics:Num rows: 13 Data size:
1274 Basic stats: COMPLETE Column stats: COMPLETE                                        
                                                                                  |
> |                                  |<-Select Operator [OP_4221]                  
                                                                                         
                                                                                        |
> |                                        outputColumnNames:["_col0","_col1","_col2"]
                                                                                         
                                                                                     |
> |                                        Statistics:Num rows: 50289673 Data size: 8197216699
Basic stats: COMPLETE Column stats: COMPLETE                                             
                                                                             |
> |                                        Filter Operator [FIL_4220]                 
                                                                                         
                                                                                     |
> |                                           predicate:(((account_id is not null and (account_type
= 'order ahead')) and year(effective_date) is not null) and month(effective_date) is not null)
(type: boolean)                                                |
> |                                           Statistics:Num rows: 50289673 Data size:
8197216699 Basic stats: COMPLETE Column stats: COMPLETE                                  
                                                                                     |
> |                                           TableScan [TS_4165]                     
                                                                                         
                                                                                     |
> |                                              alias:a                              
                                                                                         
                                                                                     |
> |                                              Statistics:Num rows: 201158695 Data size:
32788867285 Basic stats: COMPLETE Column stats: COMPLETE                                 
                                                                                         
                                                                                         
                                      
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message