hive-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ayon Sinha <>
Subject Re: multiple tables join with only one hug table.
Date Fri, 12 Aug 2011 02:25:17 GMT
The Mapjoin hint syntax help optimize by loading the smaller tables specified in the Mapjoin
hint into memory. Then every small table is in memory of each mapper.
See My Photos on Flickr
Also check out my Blog for answers to commonly asked questions.

From: "Daniel,Wu" <>
To: hive <>
Sent: Thursday, August 11, 2011 7:01 PM
Subject: multiple tables join with only one hug table.

if the retailer fact table is sale_fact with 10B rows, and join with 3 small tables: stores
(10K), products(10K), period (1K). What's the best join solution?

In oracle, it can first build hash for stores, and hash for products, and hash for stores.
Then probe using the fact table, if the row matched in stores, that row can go up further
to map with products by hashing check, if pass, then go up further to try to match period.
In this way, the sale_fact only needs to be scanned once which save lots of disk IO.  Is
this doable in hive, if doable, what hint need to use?
View raw message