hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/LanguageManual/DDL" by PhiloVivero
Date Mon, 02 May 2011 23:29:37 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/LanguageManual/DDL" page has been changed by PhiloVivero.
The comment on this change is: Clarify what to do when partitioned values are in the table
data..
http://wiki.apache.org/hadoop/Hive/LanguageManual/DDL?action=diff&rev1=87&rev2=88

--------------------------------------------------

  Use STORED BY to create a non-native table, for example in HBase.  See [[Hive/StorageHandlers]]
for more information on this option.
  
  Partitioned tables can be created using the PARTITIONED BY clause. A table can have one
or more partition columns and a separate data directory is created for each distinct value
combination in the partition columns. Further, tables or partitions can be bucketed using
CLUSTERED BY columns, and data can be sorted within that bucket via SORT BY columns. This
can improve performance on certain kinds of queries.
+ 
+ If, when creating a partitioned table, you get this error: "FAILED: Error in semantic analysis:
Column repeated in partitioning columns," it means you are trying to include the partitioned
column in the data of the table itself. You probably really do have the column defined. However,
the partition you create makes a pseudocolumn on which you can query, so you must rename your
table column to something else (that users should not query on!).
+ 
+ Here is an example. Suppose your original table was this:
+ 
+ {{{
+ id     int,
+ date   date,
+ name   varchar
+ }}}
+ 
+ Now you want to partition on date. Your Hive definition would be this:
+ 
+ {{{
+ create table table_name (
+   id                int,
+   dtDontQuery       string,
+   name              string
+ )
+ partitioned by (date string)
+ }}}
+ 
+ Now your users will still query on "where date = '...'" but the 2nd column will be the original
values.
  
  Table names and column names are case insensitive but SerDe and property names are case
sensitive.  Table and column comments are string literals (single-quoted).  The TBLPROPERTIES
clause allows you to tag the table definition with your own metadata key/value pairs.
  

Mime
View raw message