hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ameet chaubal <>
Subject Re: Condition for doing a sort merge bucket map join
Date Tue, 22 May 2012 18:25:33 GMT
you should have the bucket columns = join keys = sort columns. When this condition is true,
I was able to make SMB work.
Even if one of the join keys is a partition (i.e. cannot be part of clustering/sorting set),
it did not work for me.
So, I'd say just check that all the 7 table joins use the same join keys which are all clustered/sorted.


 From: Bruce Bian <>
Sent: Tuesday, May 22, 2012 11:07 AM
Subject: Condition for doing a sort merge bucket map join

Hi ,
I've got 7 large tables to join(each ~10G in size) into one table, all with the same 2 join
keys, I've read some documents on sort merge bucket map join, but failed to fire that.
I've bucketed all the 7 tables into 20 buckets and sorted  by one of the join key,
set hive.optimize.bucketmapjoin = true;
set hive.optimize.bucketmapjoin.sortedmerge = true;
Set the above parameters while doing the join.
What else do I miss? Do I have to bucket on both of the join keys(I'm currently trying this)?
And does each bucket file has to be smaller than one HDFS block?
Thanks a lot.
View raw message