hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/DML" by JoydeepSensarma
Date Thu, 22 Jan 2009 01:30:27 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by JoydeepSensarma:
http://wiki.apache.org/hadoop/Hive/LanguageManual/DML

------------------------------------------------------------------------------
    * relative path, eg: `project/data1`
    * absolute path, eg: `/user/hive/project/data1`
    * a full URI with scheme and (optionally) an authority, eg: `hdfs://namenode:9000/user/hive/project/data1`
-  * The target can be a table or a partition. If the table is partitioned, then one must
specify a specific partition of the table by specifying values for all of the partitioning
columns.
+  * The target being loaded to can be a table or a partition. If the table is partitioned,
then one must specify a specific partition of the table by specifying values for all of the
partitioning columns.
   * ''filepath'' can refer to a file (in which case hive will move the file into the table)
or it can be a directory (in which case hive will move all the files within that directory
into the table). In either case ''filepath'' addresses a set of files. 
   * If the keyword LOCAL is specified, then:
    * the load command will look for ''filepath'' in the local file system. If a relative
path is specified - it will be interpreted relative to the current directory of the user.
User can specify a full URI for local files as well - for example: `file:///user/hive/project/data1`
@@ -34, +34 @@

  ===== Notes =====
   * ''filepath'' cannot contain subdirectories.
   * If we are not using the keyword LOCAL - ''filepath'' must refer to files within the same
filesystem as the table (or partition's) location.
+  * Hive does some minimal checks to make sure that the files being loaded match the target
table. Currently it checks that if the table is stored in sequencefile format - that the files
being loaded are also sequencefiles and vice versa.
  
- === Inserting data into tables from queries ===
+ === Inserting data into Hive Tables from queries ===
  
+ Query Results can be inserted into tables by using the insert clause
+ 
+ ===== Syntax =====
+ {{{
+ FROM from_statement
+ INSERT OVERWRITE TABLE tablename1 [PARTITION (partcol1=val1, partcol2=val2 ...)] select_statement1
+ [INSERT OVERWRITE TABLE tablename2 [PARTITION ...] select_statement2] ...
+ }}}
+ 
+ ===== Synopsis =====
+ 
+  * Inserts can be done to a table or a partition. If the table is partitioned, then one
must specify a specific partition of the table by specifying values for all of the partitioning
columns.
+  * Multiple insert clauses (also known as ''Multi Table Insert'') can be specified in the
same query
+  * The output of each of the select statements is written to the chosen table (or partition).
Currently the OVERWRITE keyword is mandatory and implies that the contents of the chosen table
or partition are replaced with the output of corresponding select statement.
+  * The output format and serialization class is determined by the table's metadata (as specified
via DDL commands on the table)
+ 
+ ===== Notes =====
+  * Multi Table Inserts minimize the number of data scans required. Hive can insert data
into multiple tables by scanning the input data just once (and applying different query operators)
to the input data.
+ 
+ === Writing data into filesystem from queries ===
+ 
+ Query results can be inserted into filesystem directories by using a slight variation of
the syntax above:
+ 
+ ===== Syntax =====
+ {{{
+ FROM from_statement
+ INSERT OVERWRITE [LOCAL] DIRECTORY directory1 select_statement1
+ [INSERT OVERWRITE [LOCAL] DIRECTORY directory2 select_statement2] ...
+ }}}
+ 
+ ===== Synopsis =====
+  * directory can be full URI. If scheme or authority are not specified, Hive will use the
scheme and authority from hadoop configuration variable `fs.default.name` that specifies the
Namenode URI.
+  * if LOCAL keyword is used - then Hive will write data to the directory on the local file
system. 
+  * Data written to the filesystem is serialized as text with columns separated by ^A and
rows separated by newlines. If any of the columns are not of primitive type - then those columns
are serialized to JSON format.
+ 
+ ===== Notes ====
+  * Insert statements to directories, local directories and tables (or partitions) can all
be used together within the same query.
+  * Inserts to HDFS filesystem directories is the best way to extract large amounts of data
from Hive. Hive can write to HDFS directories in parallel from within a map-reduce job.
+ 

Mime
View raw message