hadoop-common-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Hadoop Wiki] Trivial Update of "Hive/Tutorial" by StevenWong
Date Wed, 04 May 2011 22:38:43 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Hadoop Wiki" for change notification.

The "Hive/Tutorial" page has been changed by StevenWong.
The comment on this change is: Fix typo..
http://wiki.apache.org/hadoop/Hive/Tutorial?action=diff&rev1=36&rev2=37

--------------------------------------------------

              MAP KEYS TERMINATED BY '3'
      STORED AS SEQUENCEFILE;
  }}}
- In this example the columns that comprise of the table row are specified in a similar way
as the definition of types. Comments can be attached both at the column level as well as at
the table level. Additionally the partitioned by clause defines the partitioning columns which
are different from the data columns and are actually not stored with the data. The bucketed
on clause specifies which column to use for bucketing as well as how many buckets to create.
The delimited row format specifies how the rows are stored in the hive table. In the case
of the delimited format, this specifies how the fields are terminated, how the items within
collections (arrays or maps) are terminated and how the map keys are terminated. STORED AS
SEQUENCEFILE indicates that this data is stored in a binary format (using hadoop SequenceFiles)
on hdfs. The values shown for the ROW FORMAT and STORED AS clauses in the above example represent
the system defaults.
+ In this example the columns that comprise of the table row are specified in a similar way
as the definition of types. Comments can be attached both at the column level as well as at
the table level. Additionally the partitioned by clause defines the partitioning columns which
are different from the data columns and are actually not stored with the data. The CLUSTERED
BY clause specifies which column to use for bucketing as well as how many buckets to create.
The delimited row format specifies how the rows are stored in the hive table. In the case
of the delimited format, this specifies how the fields are terminated, how the items within
collections (arrays or maps) are terminated and how the map keys are terminated. STORED AS
SEQUENCEFILE indicates that this data is stored in a binary format (using hadoop SequenceFiles)
on hdfs. The values shown for the ROW FORMAT and STORED AS clauses in the above example represent
the system defaults.
  
  Table names and column names are case insensitive.
  
@@ -501, +501 @@

      FROM pv_gender_sum;
  }}}
  == Sampling ==
- The sampling clause allows the users to write queries for samples of the data instead of
the whole table. Currently the sampling is done on the columns that are specified in the BUCKETED
ON clause of the CREATE TABLE statement. In the following example we choose 3rd bucket out
of the 32 buckets of the pv_gender_sum table:
+ The sampling clause allows the users to write queries for samples of the data instead of
the whole table. Currently the sampling is done on the columns that are specified in the CLUSTERED
BY clause of the CREATE TABLE statement. In the following example we choose 3rd bucket out
of the 32 buckets of the pv_gender_sum table:
  
  {{{
      INSERT OVERWRITE TABLE pv_gender_sum_sample

Mime
View raw message