hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hive/GettingStarted" by RodrigoSchmidt
Date Thu, 05 Feb 2009 21:10:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by RodrigoSchmidt:
http://wiki.apache.org/hadoop/Hive/GettingStarted

------------------------------------------------------------------------------
  == Downloading and building ==
  
  Hive is available via SVN at: http://svn.apache.org/repos/asf/hadoop/hive/trunk
+ 
+ {{{
-   * $ svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive
+   $ svn co http://svn.apache.org/repos/asf/hadoop/hive/trunk hive
-   * $ cd hive
+   $ cd hive
-   * $ ant package
+   $ ant package
-   * $ cd build/dist
+   $ cd build/dist
-   * $ ls
+   $ ls
-     * README.txt
+   README.txt
-     * bin/ (all the shell scripts)
+   bin/ (all the shell scripts)
-     * lib/ (required jar files)
+   lib/ (required jar files)
-     * conf/ (configuration files)
+   conf/ (configuration files)
-     * examples/ (sample input and query files)
+   examples/ (sample input and query files)
+ }}}
+ 
  
  In the rest of the page, we use build/dist and <install-dir> interchangeably.
  
@@ -43, +47 @@

  
  
  To use hive command line interface (cli) from the shell:
+ {{{
-   * $ bin/hive
+   $ bin/hive
+ }}}
  
  == Using Hive ==
  === Configuration management overview ===
@@ -81, +87 @@

  
  Creating Hive tables and browsing through them
  
+ {{{
- hive> CREATE TABLE pokes (foo INT, bar STRING);  
+   hive> CREATE TABLE pokes (foo INT, bar STRING);  
+ }}}
  
  Creates a table called pokes with two columns, the first being an integer and the other
a string
  
+ {{{
- hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);  
+   hive> CREATE TABLE invites (foo INT, bar STRING) PARTITIONED BY (ds STRING);  
+ }}}
  
  Creates a table called invites with two columns and a partition column 
  called ds. The partition column is a virtual column. It is not part 
@@ -96, +106 @@

  By default, tables are assumed to be of text input format and the 
  delimiters are assumed to be ^A(ctrl-a).
  
+ {{{
- hive> SHOW TABLES;
+   hive> SHOW TABLES;
+ }}}
  
  lists all the tables
  
+ {{{
- hive> SHOW TABLES '.*s';
+   hive> SHOW TABLES '.*s';
+ }}}
  
  lists all the table that end with 's'. The pattern matching follows Java regular 
  expressions. Check out this link for documentation 
@@ -112, +126 @@

  
  As for altering tables, table names can be changed and additional columns can be dropped:
  
+ {{{
- hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
+   hive> ALTER TABLE pokes ADD COLUMNS (new_col INT);
- 
- hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');
+   hive> ALTER TABLE invites ADD COLUMNS (new_col2 INT COMMENT 'a comment');
- 
- hive> ALTER TABLE events RENAME TO 3koobecaf;
+   hive> ALTER TABLE events RENAME TO 3koobecaf;
+ }}}
  
  Dropping tables:
  
+ {{{
- hive> DROP TABLE pokes;
+   hive> DROP TABLE pokes;
- 
+ }}}
  
  == Metadata Store ==
  
@@ -148, +163 @@

  
  Loading data from flat files into Hive:
  
+ {{{
- hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes; 
+   hive> LOAD DATA LOCAL INPATH './examples/files/kv1.txt' OVERWRITE INTO TABLE pokes;

+ }}}
  
  Loads a file that contains two columns separated by ctrl-a into pokes table. 
  'local' signifies that the input file is on the local file system. If 'local' 
@@ -164, +181 @@

    in hive-default.xml. We advise users to create this directory before 
    trying to create tables via Hive.
  
+ {{{
- hive> LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites
PARTITION (ds='2008-08-15');
+   hive> LOAD DATA LOCAL INPATH './examples/files/kv2.txt' OVERWRITE INTO TABLE invites
PARTITION (ds='2008-08-15');
- 
- hive> LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE invites
PARTITION (ds='2008-08-08');
+   hive> LOAD DATA LOCAL INPATH './examples/files/kv3.txt' OVERWRITE INTO TABLE invites
PARTITION (ds='2008-08-08');
+ }}}
  
  The two LOAD statements above load data into two different partitions of the table
  invites. Table invites must be created as partitioned by the key ds for this to succeed.
@@ -178, +196 @@

    of such queries can be controlled by the hadoop configuration variables.
  
    * The cli command 'SET' can be used to set any hadoop (or hive) configuration variable.
For example:
+ 
+ {{{
-     o hive> SET mapred.job.tracker=myhost.mycompany.com:50030
+     hive> SET mapred.job.tracker=myhost.mycompany.com:50030
- 
-     o hive> SET -v 
+     hive> SET -v 
+ }}}
+ 
    The latter shows all the current settings. Without the -v option only the 
    variables that differ from the base hadoop configuration are displayed
    * In particular, the number of reducers should be set to a reasonable number 
@@ -193, +214 @@

  More are available in the hive sources at ql/src/test/queries/positive
  
  ==== SELECTS and FILTERS ====
+ 
+ {{{
- hive> SELECT a.foo FROM invites a WHERE a.ds='<DATE>';
+   hive> SELECT a.foo FROM invites a WHERE a.ds='<DATE>';
+ }}}
  
  selects column 'foo' from all rows of partition <DATE> of invites table. The results
are not
  stored anywhere, but are displayed on the console.
@@ -201, +225 @@

  Note that in all the examples that follow, INSERT (into a hive table, local 
  directory or HDFS directory) is optional. 
  
+ {{{
- hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='<DATE>';
+   hive> INSERT OVERWRITE DIRECTORY '/tmp/hdfs_out' SELECT a.* FROM invites a WHERE a.ds='<DATE>';
+ }}}
  
  selects all rows from partition <DATE> OF invites table into an HDFS  directory. The
result data 
  is in files (depending on the number of mappers) in that directory.
@@ -210, +236 @@

  
  Partitioned tables must always have a partition selected in the WHERE clause of the statement.
  
+ {{{
- hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a;
+   hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/local_out' SELECT a.* FROM pokes a;
+ }}}
  
  Selects all rows from pokes table into a local directory
  
+ {{{
- hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;
+   hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a;
- 
- hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100;

+   hive> INSERT OVERWRITE TABLE events SELECT a.* FROM profiles a WHERE a.key < 100;

- 
- hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.* FROM events a;
+   hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/reg_3' SELECT a.* FROM events a;
- 
- hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites, a.pokes FROM profiles
a;
+   hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_4' select a.invites, a.pokes FROM profiles
a;
- 
- hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(1) FROM invites a WHERE a.ds='<DATE>';
+   hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT COUNT(1) FROM invites a WHERE
a.ds='<DATE>';
- 
- hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a;
+   hive> INSERT OVERWRITE DIRECTORY '/tmp/reg_5' SELECT a.foo, a.bar FROM invites a;
- 
- hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT SUM(a.pc) FROM pc1 a;
+   hive> INSERT OVERWRITE LOCAL DIRECTORY '/tmp/sum' SELECT SUM(a.pc) FROM pc1 a;
+ }}}
  
  Sum of a column. avg, min, max can also be used
  
  ==== GROUP BY ====
  
+ {{{
- hive> FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(1) WHERE a.foo
> 0 GROUP BY a.bar;
+   hive> FROM invites a INSERT OVERWRITE TABLE events SELECT a.bar, count(1) WHERE a.foo
> 0 GROUP BY a.bar;
- 
- hive> INSERT OVERWRITE TABLE events SELECT a.bar, count(1) FROM invites a WHERE a.foo
> 0 GROUP BY a.bar;
+   hive> INSERT OVERWRITE TABLE events SELECT a.bar, count(1) FROM invites a WHERE a.foo
> 0 GROUP BY a.bar;
+ }}}
  
  ==== JOIN ====
  
+ {{{
- hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events
SELECT t1.bar, t1.foo, t2.foo;
+   hive> FROM pokes t1 JOIN invites t2 ON (t1.bar = t2.bar) INSERT OVERWRITE TABLE events
SELECT t1.bar, t1.foo, t2.foo;
- 
+ }}}
  ==== MULTITABLE INSERT ====
  
- <source>
+ {{{
    FROM src
    INSERT OVERWRITE TABLE dest1 SELECT src.* WHERE src.key < 100
    INSERT OVERWRITE TABLE dest2 SELECT src.key, src.value WHERE src.key >= 100 and src.key
< 200
    INSERT OVERWRITE TABLE dest3 PARTITION(ds='2008-04-08', hr='12') SELECT src.key WHERE
src.key >= 200 and src.key < 300
    INSERT OVERWRITE LOCAL DIRECTORY '/tmp/dest4.out' SELECT src.value WHERE src.key >=
300;
- </source>
+ }}}
  
  ==== STREAMING ====
  
+ {{{
- hive> FROM invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM(a.foo, a.bar) AS
(oof, rab) USING '/bin/cat' WHERE a.ds > '2008-08-09';
+   hive> FROM invites a INSERT OVERWRITE TABLE events SELECT TRANSFORM(a.foo, a.bar) AS
(oof, rab) USING '/bin/cat' WHERE a.ds > '2008-08-09';
+ }}}
  
  This streams the data in the map phase through the script /bin/cat (like hadoop streaming).

  Similarly - streaming can be used on the reduce side (please see the Hive Tutorial or examples)

Mime
View raw message