hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/Joins" by Ning Zhang
Date Thu, 11 Mar 2010 19:45:11 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/Joins" page has been changed by Ning Zhang.
http://wiki.apache.org/hadoop/Hive/LanguageManual/Joins?action=diff&rev1=10&rev2=11

--------------------------------------------------

  join_table:
      table_reference JOIN table_factor [join_condition]
    | table_reference {LEFT|RIGHT|FULL} [OUTER] JOIN table_reference join_condition
+   | table_reference LEFT SEMI JOIN table_reference join_condition
  
  table_reference:
      table_factor
@@ -27, +28 @@

      expression = expression
  }}}
  
- Only equality joins and outer joins are supported in Hive. Hive does not support join conditions
that are not equality
+ Only equality joins, outer joins, and left semi joins are supported in Hive. Hive does not
support join conditions that are not equality
  conditions as it is very difficult to express such conditions as a map/reduce job. Also,
more than two tables can be
  joined in Hive.
  
@@ -92, +93 @@

    LEFT OUTER JOIN c ON (a.key = c.key)
  }}}
    ...first joins a on b, throwing away everything in a or b that does not have a corresponding
key in the other table. The reduced table is then joined on c. This provides unintuitive results
if there is a key that exists in both a and c, but not b: The whole row (including a.val1,
a.val2, and a.key) is dropped in the "a JOIN b" step, so when the result of that is joined
with c, any row with a c.key that had a corresponding a.key or b.key (but not both) will show
up as NULL, NULL, NULL, c.val.
+  * LEFT SEMI JOIN implements the correlated IN/EXISTS subquery semantics in an efficient
way. Since Hive currently does not support IN/EXISTS subqueries, you can rewrite your queries
using LEFT SEMI JOIN. The restrictions of using LEFT SEMI JOIN is that the right-hand-side
table should only be referenced in the join condition (ON-clause), but not in WHERE- or SELECT-clauses
etc.  
+ {{{
+   SELECT a.key, a.value
+   FROM a 
+   WHERE a.key in 
+    (SELECT b.key
+     FROM B);
+ }}}
+ can be rewritten to:
+ {{{
+    SELECT a.key, a.val
+    FROM a LEFT SEMI JOIN b on (a.key = b.key)
+ }}}   
  

Mime
View raw message