hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/Joins" by NamitJain
Date Wed, 31 Mar 2010 21:29:59 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/Joins" page has been changed by NamitJain.


  can be done on the mapper only. Instead of fetching B completely for each mapper of A, only
the required buckets are fetched. For the query above, the mapper processing bucket 1 for
A will only fetch bucket 1 of B.
  It is not the default behavior, and is governed by the following parameter. '''set hive.optimize.bucketmapjoin
= true'''
+  * If the tables being joined are sorted and bucketized, and the number of buckets are same,
a sort-merge join can be performed. The corresponding buckets are joined with each other at
the mapper. If both A and B have 4 buckets,
+ {{{
+   SELECT /*+ MAPJOIN(b) */ a.key, a.value
+   FROM A a join B b on a.key = b.key
+ }}}
+ can be done on the mapper only. The mapper for the bucket for A will traverse the corresponding
bucket for B. This is not the default behavior, and the following parameters need to be set:
+   '''set hive.input.format=org.apache.hadoop.hive.ql.io.BucketizedHiveInputFormat'''
+   '''set hive.optimize.bucketmapjoin = true'''
+   '''set hive.optimize.bucketmapjoin.sortedmerge = true'''

View raw message