hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/Joins" by NamitJain
Date Wed, 31 Mar 2010 21:01:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/Joins" page has been changed by NamitJain.


    SELECT a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN c ON (c.key = b.key2)
    there are two map/reduce jobs involved in computing the join. The first of these joins
a with b and buffers the values of a while streaming the values of b in the reducers. The
second of one of these jobs buffers the results of the first join while streaming the values
of c through the reducers.
+  * In every map/reduce stage of the join, the table to be streamed can be specified via
a hint. e.g. in
+ {{{
+   SELECT /*+ STREAMTABLE(a) */ a.val, b.val, c.val FROM a JOIN b ON (a.key = b.key1) JOIN
c ON (c.key = b.key1)
+ }}}
+   all the three tables are joined in a single map/reduce job and the values for a particular
value of the key for tables b and c are buffered in the memory in the reducers. Then for each
row retrieved from a, the join is computed with the buffered rows. 
   * LEFT, RIGHT, and FULL OUTER joins exist in order to provide more control over ON clauses
for which there is no match. For example, this query:
    SELECT a.val, b.val FROM a LEFT OUTER JOIN b ON (a.key=b.key)

View raw message