hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hao Zhu <h...@maprtech.com>
Subject What is the difference in explain output between Bucket Map Join and normal Map Join?
Date Tue, 18 Nov 2014 22:20:37 GMT
Hi All,

I am testing the different explain plan output and found it hard to find
the difference between Bucket Map Join(non sort merged) and normal Map Join.

The tested hive version is 0.13.
To enable Bucket Map Join, there are 2 conditions:
1. set hive.optimize.bucketmapjoin = true;
2. If the tables being joined are bucketized on the join columns, and the
number of buckets in one table is a multiple of the number of buckets in
the other table, the buckets can be joined with each other.

Here are my tests:
*1. Create 2 Bucketed Tables(non-sorted.)*
create table b1(col0 string,col1 string,col2 string,col3 string,col4
string,col5 string,col6 string)
clustered by (col0) into 32 buckets;
create table b2(col0 string,col1 string,col2 string,col3 string,col4
string,col5 string,col6 string)
clustered by (col0) into 8 buckets;
set hive.enforce.bucketing = true;
>From passwords insert OVERWRITE  table b1 select * limit 10000;
>From passwords insert OVERWRITE  table b2 select * limit 10000;

*2. set hive.optimize.bucketmapjoin = true;*

> hive> explain  select  b1.* from b1,b2 where b1.col0=b2.col0;
> OK
> STAGE DEPENDENCIES:
>   Stage-4 is a root stage
>   Stage-3 depends on stages: Stage-4
>   Stage-0 is a root stage
> STAGE PLANS:
>   Stage: Stage-4
>     Map Reduce Local Work
>       Alias -> Map Local Tables:
>         b1
>           Fetch Operator
>             limit: -1
>       Alias -> Map Local Operator Tree:
>         b1
>           TableScan
>             alias: b1
>             Statistics: Num rows: 10000 Data size: 478869 Basic stats:
> COMPLETE Column stats: NONE
>             HashTable Sink Operator
>               condition expressions:
>                 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6}
>                 1 {col0}
>               keys:
>                 0 col0 (type: string)
>                 1 col0 (type: string)
>   Stage: Stage-3
>     Map Reduce
>       Map Operator Tree:
>           TableScan
>             alias: b2
>             Statistics: Num rows: 10000 Data size: 478869 Basic stats:
> COMPLETE Column stats: NONE
>             Map Join Operator
>               condition map:
>                    Inner Join 0 to 1
>               condition expressions:
>                 0 {col0} {col1} {col2} {col3} {col4} {col5} {col6}
>                 1 {col0}
>               keys:
>                 0 col0 (type: string)
>                 1 col0 (type: string)
>               outputColumnNames: _col0, _col1, _col2, _col3, _col4, _col5,
> _col6, _col9
>               Statistics: Num rows: 11000 Data size: 526755 Basic stats:
> COMPLETE Column stats: NONE
>               Filter Operator
>                 predicate: (_col0 = _col9) (type: boolean)
>                 Statistics: Num rows: 5500 Data size: 263377 Basic stats:
> COMPLETE Column stats: NONE
>                 Select Operator
>                   expressions: _col0 (type: string), _col1 (type: string),
> _col2 (type: string), _col3 (type: string), _col4 (type: string), _col5
> (type: string), _col6 (type: string)
>                   outputColumnNames: _col0, _col1, _col2, _col3, _col4,
> _col5, _col6
>                   Statistics: Num rows: 5500 Data size: 263377 Basic
> stats: COMPLETE Column stats: NONE
>                   File Output Operator
>                     compressed: false
>                     Statistics: Num rows: 5500 Data size: 263377 Basic
> stats: COMPLETE Column stats: NONE
>                     table:
>                         input format:
> org.apache.hadoop.mapred.TextInputFormat
>                         output format:
> org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat
>                         serde:
> org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe
>       Local Work:
>         Map Reduce Local Work
>   Stage: Stage-0
>     Fetch Operator
>       limit: -1
> Time taken: 0.171 seconds, Fetched: 63 row(s)

So we only see normal Map Join operator.

*My question is :*


   - What operator will Bucket map join show in explain output?
   - The same as normal Map Join?If so, how do we know if the SQL is using
   bucket map join or normal map join?

Thanks.


Thanks,

Hao

Mime
View raw message