hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/UserGuide" by DavidPhillips
Date Mon, 01 Dec 2008 21:54:38 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by DavidPhillips:
http://wiki.apache.org/hadoop/Hive/UserGuide

The comment on the change is:
added examples from Jeff's 20081030linkedin presentation

------------------------------------------------------------------------------
  == Supported Features ==
  == Usage Examples ==
  === Creating tables ===
+ 
+ ==== MovieLens User Ratings ====
+ {{{
+ CREATE TABLE u_data (
+   userid INT,
+   movieid INT,
+   rating INT,
+   unixtime TIMESTAMP)
+ ROW FORMAT DELIMITED
+ FIELDS TERMINATED BY '\t';
+ }}}
  
  ==== Apache Access Log Tables ====
  {{{
@@ -19, +30 @@

  'serialization.null.format'='-')
  STORED AS TEXTFILE;
  }}}
+ 
  ==== Control Separated Tables ====
  {{{
  CREATE TABLE mylog (
@@ -32, +44 @@

  }}}
  
  === Loading tables ===
+ 
+ ==== MovieLens User Ratings ====
+ Download and extract the data:
+ {{{
+ wget http://www.grouplens.org/system/files/ml-data.tar__0.gz
+ tar xvzf ml-data.tar__0.gz
+ }}}
+ 
+ Load it in:
+ {{{
+ LOAD DATA LOCAL INPATH 'ml-data/u.data'
+ OVERWRITE INTO TABLE u_data;
+ }}}
+ 
  === Running queries ===
+ 
+ ==== MovieLens User Ratings ====
+ {{{
+ SELECT COUNT(1) FROM u_data;
+ }}}
+ 
  === Running custom map/reduce jobs ===
+ 
+ ==== MovieLens User Ratings ====
+ Create weekday_mapper.py:
+ {{{
+ import sys
+ import datetime
+ 
+ for line in sys.stdin:
+   line = line.strip()
+   userid, movieid, rating, unixtime = line.split('\t')
+   weekday = datetime.datetime.fromtimestamp(float(unixtime)).isoweekday()
+   print ','.join([userid, movieid, rating, str(weekday)])
+ }}}
+ 
+ Use the mapper script:
+ {{{
+ CREATE TABLE u_data_new (
+   userid INT,
+   movieid INT,
+   rating INT,
+   weekday INT)
+ ROW FORMAT DELIMITED
+ FIELDS TERMINATED BY '\t';
+ 
+ INSERT OVERWRITE TABLE u_data_new
+ SELECT
+   TRANSFORM (userid, movieid, rating, unixtime)
+   USING 'python weekday_mapper.py'
+   AS (userid, movieid, rating, weekday)
+ FROM u_data;
+ 
+ SELECT weekday, COUNT(1)
+ FROM u_data_new
+ GROUP BY weekday;
+ }}}
+ 
+ '''Note: due to a bug in the parser, you must run the "INSERT OVERWRITE" query on a single
line'''
+ 
  === Using sampling ===
  == Known Issues/Bugs ==
  

Mime
View raw message