hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Update of "Hive/Tutorial" by LarryOgrodnek
Date Wed, 02 Feb 2011 02:52:41 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/Tutorial" page has been changed by LarryOgrodnek.
The comment on this change is: small comment indicating that dynamic column values are taken
from the end of the select clause.
http://wiki.apache.org/hadoop/Hive/Tutorial?action=diff&rev1=31&rev2=32

--------------------------------------------------

  
  There are several syntactic differences from the multi-insert statement: 
    * country appears in the PARTITION specification, but with no value associated. In this
case, country is a ''dynamic partition column''. On the other hand, ds has a value associated
with it, which means it is a ''static partition column''. If a column is dynamic partition
column, its value will be coming from the input column. Currently we only allow dynamic partition
columns to be the last column(s) in the partition clause because the partition column order
indicates its hierarchical order (meaning dt is the root partition, and country is the child
partition). You cannot specify a partition clause with (dt, country='US') because that means
you need to update all partitions with any date and its country sub-partition is 'US'. 
-   * An additional pvs.country column is added in the select statement. This is the corresponding
input column for the dynamic partition column. Note that you do not need to add an input column
for the static partition column because its value is already known in the PARTITION clause.

+   * An additional pvs.country column is added in the select statement. This is the corresponding
input column for the dynamic partition column. Note that you do not need to add an input column
for the static partition column because its value is already known in the PARTITION clause.
Note that the dynamic partition values are selected by ordering, not name, and taken as the
last columns from the select clause.
  
  Semantics of the dynamic partition insert statement:
    * When there are already non-empty partitions exists for the dynamic partition columns,
(e.g., country='CA' exists under some ds root partition), it will be overwritten if the dynamic
partition insert saw the same value (say 'CA') in the input data. This is in line with the
'insert overwrite' semantics. However, if the partition value 'CA' does not appear in the
input data, the existing partition will not be overwritten. 

Mime
View raw message