hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/Tutorial" by JonathanHsu
Date Wed, 04 Nov 2009 18:41:34 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/Tutorial" page has been changed by JonathanHsu.
The comment on this change is: Joining tables should have the large table on the right. Mention
made and examples modified..
http://wiki.apache.org/hadoop/Hive/Tutorial?action=diff&rev1=14&rev2=15

--------------------------------------------------

  {{{     
      INSERT OVERWRITE TABLE pv_users 
      SELECT pv.*, u.gender, u.age 
-     FROM page_view pv JOIN user u ON (pv.userid = u.id) 
+     FROM user u JOIN page_view pv ON (pv.userid = u.id) 
      WHERE pv.date = '2008-03-03';  
  }}}
  
@@ -357, +357 @@

  {{{     
      INSERT OVERWRITE TABLE pv_users 
      SELECT pv.*, u.gender, u.age 
-     FROM page_view pv FULL OUTER JOIN user u ON (pv.userid = u.id) 
+     FROM user u FULL OUTER JOIN page_view pv ON (pv.userid = u.id) 
      WHERE pv.date = '2008-03-03';  
  }}}
  In order to join more than one tables, the user can use the following syntax: 
@@ -368, +368 @@

      WHERE pv.date = '2008-03-03'; 
  }}}
  
- Note that Hive only supports [[http://en.wikipedia.org/wiki/Join_(SQL)#Equi-join|equi-joins]].

+ Note that Hive only supports [[http://en.wikipedia.org/wiki/Join_(SQL)#Equi-join|equi-joins]].
Note also that it is best to put the largest table on the rightmost side of the join in order
to avoid memory errors.
  
  == Aggregations ==
  In order to count the number of distinct users by gender one could write the following query:


Mime
View raw message