hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/LanguageManual/DML" by JoydeepSensarma
Date Thu, 22 Jan 2009 00:37:07 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The following page has been changed by JoydeepSensarma:

New page:
There are two primary ways of manipulating data in Hive:

=== Loading files into tables ===

Hive does not do any transformation while loading data into tables. Load operations are current
pure copy/move operations that move datafiles into locations corresponding to Hive tables.

===== Syntax =====
LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1,
partcol2=val2 ...)]

===== Synopsis =====

Load operations are current pure copy/move operations that move datafiles into locations corresponding
to Hive tables.
 * ''filepath'' can be a 
  * relative path, eg: `project/data1`
  * absolute path, eg: `/user/hive/project/data1`
  * a full URI with scheme and (optionally) an authority, eg: `hdfs://namenode:9000/user/hive/project/data1`
 * The target can be a table or a partition. If the table is partitioned, then one must specify
a specific partition of the table by specifying values for all of the partitioning columns.
 * ''filepath'' can refer to a file (in which case hive will move the file into the table)
or it can be a directory (in which case hive will move all the files within that directory
into the table). In either case ''filepath'' addresses a set of files. 
 * If the keyword LOCAL is specified, then:
  * the load command will look for ''filepath'' in the local file system. If a relative path
is specified - it will be interpreted relative to the current directory of the user. User
can specify a full URI for local files as well - for example: `file:///user/hive/project/data1`
  * the load command will try to copy all the files addressed by ''filepath'' to the target
filesystem. The target file system is inferred by looking at the location attribute of the
table. The copied data files will then be moved to the table.
 * If the keyword LOCAL is ''not'' specified, then Hive will either use the full URI of ''filepath''
if one is specified. Otherwise the following rules are applied:
  * If scheme or authority are not specified, Hive will use the scheme and authority from
hadoop configuration variable `fs.default.name` that specifies the Namenode URI.
  * If the path is not absolute - then Hive will interpret it relative to `/user/<username>`
  * Hive will ''move'' the files addressed by ''filepath'' into the table (or partition)
 * if the OVERWRITE keyword is used then the contents of the target table (or partition) will
be deleted and replaced with the files referred to by ''filepath''. Otherwise the files referred
by ''filepath'' will be added to the table.
  * Note that if the target table (or partition) already has a file whose name collides with
any of the filenames contained in ''filepath'' - then the existing file will be replaced with
the new file.

===== Notes =====
* ''filepath'' cannot contain subdirectories.
* If we are not using the keyword LOCAL - ''filepath'' must refer to files within the same
filesystem as the table (or partition's) location.

=== Inserting data into tables from queries ===

View raw message