hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mark Grover <mgro...@oanda.com>
Subject Re: Condition for doing a sort merge bucket map join
Date Tue, 22 May 2012 15:43:25 GMT
Hi Bruce,
Instead of joining 7 tables in the query, can you please start off with 2 tables and see if
that works? If it doesn't, feel free to paste your table definitions and join query along
with any properties you are setting and folks on the mailing list can take a jab at it.


Mark

----- Original Message -----
From: "Bruce Bian" <weidong.ban@gmail.com>
To: user@hive.apache.org
Sent: Tuesday, May 22, 2012 11:07:38 AM
Subject: Condition for doing a sort merge bucket map join

Hi , 
I've got 7 large tables to join(each ~10G in size) into one table, all with the same 2 join
keys, I've read some documents on sort merge bucket map join, but failed to fire that. 
I've bucketed all the 7 tables into 20 buckets and sorted by one of the join key, 
set hive.optimize.bucketmapjoin = true; 
set hive.optimize.bucketmapjoin.sortedmerge = true; 
set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat; 
Set the above parameters while doing the join. 
What else do I miss? Do I have to bucket on both of the join keys(I'm currently trying this)?
And does each bucket file has to be smaller than one HDFS block? 
Thanks a lot. 

Mime
View raw message