hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From sky88088 <sky880883...@hotmail.com>
Subject about bucket map join
Date Thu, 22 May 2014 05:51:40 GMT
Hi all,	I encounter problems when trying to run map-join for large tables by using bucketing
tables.		The table is partitioned into 20 buckets:	create table if not exists ip_c_bucket
(country string, ip_from bigint, ip_to bigint) clustered by(ip_from) into 20 buckets;
	For test, I let this table join itself:	select * from ip_c_bucket a join ip_c_bucket b on
a.ip_from=b.ip_from;
	It's found that the parameter hive.optimize.bucketmapjoin has nothing to do with the query
plan. Whether this parameter is set to true or not, the query is the same. However, it seems
that map-join has been used from the query plan. 
	Since the table is joining itself, two tables of course have the same number of buckets.
Therefore I try sort-merge join by setting parameters : 	set hive.optimize.bucketmapjoin =
true; 	set hive.optimize.bucketmapjoin.sortedmerge = true; 	set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat;

	However the query plan is still the same.
	Any explanation about these problems?


        Here is the query plan:
	STAGE DEPENDENCIES:  Stage-7 is a root stage , consists of Stage-9, Stage-10, Stage-1  Stage-9
has a backup stage: Stage-1  Stage-5 depends on stages: Stage-9  Stage-4 depends on stages:
Stage-1, Stage-5, Stage-6 , consists of Stage-8  Stage-8  Stage-3 depends on stages: Stage-8
 Stage-10 has a backup stage: Stage-1  Stage-6 depends on stages: Stage-10  Stage-1  Stage-0
is a root stage
STAGE PLANS:  Stage: Stage-7    Conditional Operator
  Stage: Stage-9    Map Reduce Local Work      Alias -> Map Local Tables:        b    
      Fetch Operator            limit: -1      Alias -> Map Local Operator Tree:      
 b           TableScan            alias: b            HashTable Sink Operator            
 condition expressions:                0 {country} {ip_from} {ip_to}                1 {country}
{ip_from} {ip_to}              handleSkewJoin: false              keys:                0 [Column[ip_from]]
               1 [Column[ip_from]]              Position of Big Table: 0
  Stage: Stage-5    Map Reduce      Alias -> Map Operator Tree:        a           TableScan
           alias: a            Map Join Operator              condition map:             
     Inner Join 0 to 1              condition expressions:                0 {country} {ip_from}
{ip_to}                1 {country} {ip_from} {ip_to}              handleSkewJoin: false  
           keys:                0 [Column[ip_from]]                1 [Column[ip_from]]   
          outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7              Position
of Big Table: 0              Select Operator                expressions:                 
    expr: _col0                      type: string                      expr: _col1       
              type: bigint                      expr: _col2                      type: bigint
                     expr: _col5                      type: string                      expr:
_col6                      type: bigint                      expr: _col7                 
    type: bigint                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
               File Output Operator                  compressed: false                  GlobalTableId:
0                  table:                      input format: org.apache.hadoop.mapred.TextInputFormat
                     output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
     Local Work:        Map Reduce Local Work
  Stage: Stage-4    Conditional Operator
  Stage: Stage-8    Map Reduce Local Work      Alias -> Map Local Tables:        1    
      Fetch Operator            limit: -1      Alias -> Map Local Operator Tree:      
 1             HashTable Sink Operator              condition expressions:               
0 {0_VALUE_0} {0_VALUE_1} {0_VALUE_2}                1 {1_VALUE_0} {1_VALUE_1} {1_VALUE_2}
             handleSkewJoin: false              keys:                0 [Column[joinkey0]]
               1 [Column[joinkey0]]              Position of Big Table: 0
  Stage: Stage-3    Map Reduce      Alias -> Map Operator Tree:        0             Map
Join Operator              condition map:                   Inner Join 0 to 1            
 condition expressions:                0 {0_VALUE_0} {0_VALUE_1} {0_VALUE_2}             
  1 {1_VALUE_0} {1_VALUE_1} {1_VALUE_2}              handleSkewJoin: false              keys:
               0 [Column[joinkey0]]                1 [Column[joinkey0]]              outputColumnNames:
_col0, _col1, _col2, _col5, _col6, _col7              Position of Big Table: 0           
  Select Operator                expressions:                      expr: _col0           
          type: string                      expr: _col1                      type: bigint
                     expr: _col2                      type: bigint                      expr:
_col5                      type: string                      expr: _col6                 
    type: bigint                      expr: _col7                      type: bigint      
         outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5                File Output
Operator                  compressed: false                  GlobalTableId: 0            
     table:                      input format: org.apache.hadoop.mapred.TextInputFormat  
                   output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
     Local Work:        Map Reduce Local Work
  Stage: Stage-10    Map Reduce Local Work      Alias -> Map Local Tables:        a   
       Fetch Operator            limit: -1      Alias -> Map Local Operator Tree:     
  a           TableScan            alias: a            HashTable Sink Operator           
  condition expressions:                0 {country} {ip_from} {ip_to}                1 {country}
{ip_from} {ip_to}              handleSkewJoin: false              keys:                0 [Column[ip_from]]
               1 [Column[ip_from]]              Position of Big Table: 1
  Stage: Stage-6    Map Reduce      Alias -> Map Operator Tree:        b           TableScan
           alias: b            Map Join Operator              condition map:             
     Inner Join 0 to 1              condition expressions:                0 {country} {ip_from}
{ip_to}                1 {country} {ip_from} {ip_to}              handleSkewJoin: false  
           keys:                0 [Column[ip_from]]                1 [Column[ip_from]]   
          outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7              Position
of Big Table: 1              Select Operator                expressions:                 
    expr: _col0                      type: string                      expr: _col1       
              type: bigint                      expr: _col2                      type: bigint
                     expr: _col5                      type: string                      expr:
_col6                      type: bigint                      expr: _col7                 
    type: bigint                outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5
               File Output Operator                  compressed: false                  GlobalTableId:
0                  table:                      input format: org.apache.hadoop.mapred.TextInputFormat
                     output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
     Local Work:        Map Reduce Local Work
  Stage: Stage-1    Map Reduce      Alias -> Map Operator Tree:        a           TableScan
           alias: a            Reduce Output Operator              key expressions:      
             expr: ip_from                    type: bigint              sort order: +    
         Map-reduce partition columns:                    expr: ip_from                  
 type: bigint              tag: 0              value expressions:                    expr:
country                    type: string                    expr: ip_from                 
  type: bigint                    expr: ip_to                    type: bigint        b   
       TableScan            alias: b            Reduce Output Operator              key expressions:
                   expr: ip_from                    type: bigint              sort order:
+              Map-reduce partition columns:                    expr: ip_from            
       type: bigint              tag: 1              value expressions:                  
 expr: country                    type: string                    expr: ip_from          
         type: bigint                    expr: ip_to                    type: bigint     
Reduce Operator Tree:        Join Operator          condition map:               Inner Join
0 to 1          condition expressions:            0 {VALUE._col0} {VALUE._col1} {VALUE._col2}
           1 {VALUE._col0} {VALUE._col1} {VALUE._col2}          handleSkewJoin: true     
    outputColumnNames: _col0, _col1, _col2, _col5, _col6, _col7          Select Operator 
          expressions:                  expr: _col0                  type: string        
         expr: _col1                  type: bigint                  expr: _col2          
       type: bigint                  expr: _col5                  type: string           
      expr: _col6                  type: bigint                  expr: _col7             
    type: bigint            outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5  
         File Output Operator              compressed: false              GlobalTableId: 0
             table:                  input format: org.apache.hadoop.mapred.TextInputFormat
                 output format: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
  Stage: Stage-0    Fetch Operator      limit: -1
 		 	   		  
Mime
View raw message