hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/DDL" by HeYongqiang
Date Wed, 20 Jan 2010 22:39:00 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/DDL" page has been changed by HeYongqiang.
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff&rev1=42&rev2=43

--------------------------------------------------

  
  You must specify list of a columns for tables with native SerDe. Refer to the Types part
of the User Guide for the allowable column types. A list of columns for tables with custom
SerDe may be specified but Hive will query the SerDe to determine the actual list of columns
for this table.
  
- Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use STORED AS
SEQUENCEFILE if the data needs to be compressed. Please read more about CompressedStorage
if you are planning to keep data compressed in your Hive tables.  Use INPUTFORMAT and OUTPUTFORMAT
to specify the name of a corresponding Input``Format and Output``Format class as a string
literal, e.g. 'org.apache.hadoop.hive.contrib.fileformat.base64.Base``64``Text``Input``Format'.
+ Use STORED AS TEXTFILE if the data needs to be stored as plain text files. Use STORED AS
SEQUENCEFILE if the data needs to be compressed. Please read more about CompressedStorage
if you are planning to keep data compressed in your Hive tables.  Use INPUTFORMAT and OUTPUTFORMAT
to specify the name of a corresponding InputFormat and OutputFormat class as a string literal,
e.g. 'org.apache.hadoop.hive.contrib.fileformat.base64.Base64TextInputFormat'.
  
  Partitioned tables can be created using the PARTITIONED BY clause. A table can have one
or more partition columns and a separate data directory is created for each distinct value
combination in the partition columns. Further, tables or partitions can be bucketed using
CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns. This
can improve performance on certain kinds of queries.
  
@@ -143, +143 @@

  CREATE TABLE empty_key_value_store
  LIKE key_value_store;
  }}}
- 
  In contrast, the statement above creates a new empty_key_value_store table whose definition
exactly matches the existing key_value_store in all particulars other than table name.  The
new table contains no rows.
  
  ==== Inserting Data Into Bucketed Tables ====
@@ -161, +160 @@

  
  See the next section on ALTER TABLE for how to drop partitions.
  
- 
  == Alter Table Statements ==
  Alter table statements enable you to change the structure of an existing table. You can
add columns/partitions, change serde, add table and serde properties, or rename the table
itself.
  
  === Add Partitions ===
- 
  {{{
  ALTER TABLE table_name ADD partition_spec [ LOCATION 'location1' ] partition_spec [ LOCATION
'location2' ] ...
  
  partition_spec:
    : PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value,
...)
  }}}
- 
  You can use ALTER TABLE ADD PARTITION to add partitions to a table.  Partition values should
be quoted only if they are strings.
  
  {{{
  alter table page_view add partition (dt='2008-08-08', country='us') location '/path/to/us/part080808'
partition (dt='2008-08-09', country='us') location '/path/to/us/part080809';
  }}}
- 
  === Drop Partitions ===
- 
  {{{
  ALTER TABLE table_name DROP partition_spec, partition_spec,...
  }}}
- 
  You can use ALTER TABLE DROP PARTITION to drop a partition for a table. This removes the
data and metadata for this partition.
  
  {{{
  alter table page_view drop partition(dt='2008-08-08', country='us');
  }}}
- 
  === Rename Table ===
  {{{
  ALTER TABLE table_name RENAME TO new_table_name
@@ -202, +194 @@

  {{{
  ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment]
[FIRST|AFTER column_name]
  }}}
- 
  This command will allow users to change a column's name, data type, comment, or position,
or an arbitrary combination of them.
  
- Example:
- create table test_change (a int, b int, c int);
+ Example: create table test_change (a int, b int, c int);
  
  "alter table change a a1 int;" will change column a's name to a1.
  
@@ -244, +234 @@

  This statement enables you to add user defined metadata to table SerDe object. The serde
properties are passed to the table's SerDe when it is being initialized by Hive to serialize
and deserialize data. So users can store any information required for their custom serde here.
Refer to SerDe section of Users Guide for more information.
  
  === Alter Table File Format and Organization ===
- 
  {{{
  ALTER TABLE table_name SET FILE FORMAT file_format
  ALTER TABLE table_name CLUSTERED BY (col_name, col_name, ...) [SORTED BY (col_name, ...)]
INTO num_buckets BUCKETS
  }}}
- 
  These statements change the table's physical storage properties.  For available file_format
options, see the section above on CREATE TABLE.
  
  NOTE: These commands will only modify Hive's metadata, and will NOT reorganize or reformat
existing data. Users should make sure the actual data layout conforms with the metadata definition.
  
  == Create/Drop Function ==
- 
  === Create Function ===
  {{{
  CREATE TEMPORARY FUNCTION function_name AS class_name
@@ -263, +250 @@

  This statement lets you create a function that is implemented by the class_name. You can
use this function in Hive queries as long as the session lasts. You can use any class that
is in the class path of Hive. You can add jars to class path by executing 'ADD FILES' statements.
Please refer to the CLI section in the User Guide for more information on how to add/delete
files from the Hive classpath. Using this, you can register User Defined Functions (UDF's).
  
  === Drop Function ===
- 
  You can unregister a UDF as follows:
  
  {{{
  DROP TEMPORARY FUNCTION function_name
  }}}
- 
  == Show/Describe Statements ==
  These statements provide a way to query the Hive metastore for existing data and metadata
accessible to this Hive system.
  
@@ -285, +270 @@

  }}}
  SHOW PARTITIONS lists all the existing partitions for a given table. Partitions are listed
in alphabetical order.
  
- === Show Table/Partitions Extended===
+ === Show Table/Partitions Extended ===
  {{{
  SHOW TABLE EXTENDED [ IN|FROM database_name] LIKE identifier_with_wildcards [PARTITION(partition_desc)]
  }}}
+ SHOW TABLE EXTENDED will list information for all the tables matching the given regular
expression. If a partition specification is present, it will output table information plus
given partition's file system information. Users can not use regular expression for table
name if a partition specification is present. This command's output includes table information
and file system information like totalNumberFiles, totalFileSize, maxFileSize, minFileSize,lastAccessTime,
and lastUpdateTime. If partition is present, it will output the given partition's file system
information.
- SHOW TABLE EXTENDED will list information for all the tables matching the given regular
expression. If a partition specification is present, it will output table information plus
given partition's file system information. Users can not use regular expression for table
name if a partition specification is present.
- This command's output includes table information and file system information like totalNumberFiles,
totalFileSize, maxFileSize, minFileSize,lastAccessTime, and lastUpdateTime. If partition is
present, it will output the given partition's file system information.
  
  === Show Functions ===
  {{{
  SHOW FUNCTIONS "a.*"
  }}}
  SHOW FUNCTIONS lists all the user defined and builtin functions matching the regular expression.
To get all functions use ".*"
- 
  
  === Describe Table/Column ===
  {{{

Mime
View raw message