From hcatalog-commits-return-510-apmail-incubator-hcatalog-commits-archive=incubator.apache.org@incubator.apache.org Sun Oct 2 21:07:30 2011 Return-Path: X-Original-To: apmail-incubator-hcatalog-commits-archive@minotaur.apache.org Delivered-To: apmail-incubator-hcatalog-commits-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 4FF117B17 for ; Sun, 2 Oct 2011 21:07:30 +0000 (UTC) Received: (qmail 83842 invoked by uid 500); 2 Oct 2011 21:07:30 -0000 Delivered-To: apmail-incubator-hcatalog-commits-archive@incubator.apache.org Received: (qmail 83808 invoked by uid 500); 2 Oct 2011 21:07:30 -0000 Mailing-List: contact hcatalog-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hcatalog-dev@incubator.apache.org Delivered-To: mailing list hcatalog-commits@incubator.apache.org Received: (qmail 83798 invoked by uid 99); 2 Oct 2011 21:07:30 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Oct 2011 21:07:30 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Sun, 02 Oct 2011 21:07:06 +0000 Received: from eris.apache.org (localhost [127.0.0.1]) by eris.apache.org (Postfix) with ESMTP id 79A0F2388C7C; Sun, 2 Oct 2011 21:05:41 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: svn commit: r1178252 [31/34] - in /incubator/hcatalog/site: author/src/documentation/content/xdocs/ publish/docs/ publish/docs/r0.2.0/ publish/docs/r0.2.0/api/ publish/docs/r0.2.0/api/org/ publish/docs/r0.2.0/api/org/apache/ publish/docs/r0.2.0/api/org... Date: Sun, 02 Oct 2011 21:05:30 -0000 To: hcatalog-commits@incubator.apache.org From: hashutosh@apache.org X-Mailer: svnmailer-1.0.8-patched Message-Id: <20111002210541.79A0F2388C7C@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Added: incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-frame.html URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-frame.html?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-frame.html (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-frame.html Sun Oct 2 21:05:22 2011 @@ -0,0 +1,64 @@ + + + + + + +Overview List (HCatalog 0.2.0-incubating API) + + + + + + + + + + + + + + + +
+
+ + + + + +
All Classes +

+ +Packages +
+org.apache.hcatalog.cli +
+org.apache.hcatalog.cli.SemanticAnalysis +
+org.apache.hcatalog.common +
+org.apache.hcatalog.data +
+org.apache.hcatalog.data.schema +
+org.apache.hcatalog.har +
+org.apache.hcatalog.listener +
+org.apache.hcatalog.mapreduce +
+org.apache.hcatalog.oozie +
+org.apache.hcatalog.pig +
+org.apache.hcatalog.pig.drivers +
+org.apache.hcatalog.rcfile +
+

+ +

+  + + Added: incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-summary.html URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-summary.html?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-summary.html (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-summary.html Sun Oct 2 21:05:22 2011 @@ -0,0 +1,318 @@ + + + + + + +Overview (HCatalog 0.2.0-incubating API) + + + + + + + + + + + + +


+ + + + + + + + + + + + + + + +
+ +
+ + + +
+
+

+HCatalog 0.2.0-incubating API +

+
+Overview +

+See: +
+          Description +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+hcatalog
org.apache.hcatalog.cli 
org.apache.hcatalog.cli.SemanticAnalysis 
org.apache.hcatalog.common 
org.apache.hcatalog.data 
org.apache.hcatalog.data.schema 
org.apache.hcatalog.har 
org.apache.hcatalog.listener 
org.apache.hcatalog.mapreduce 
org.apache.hcatalog.oozie 
org.apache.hcatalog.pig 
org.apache.hcatalog.pig.drivers 
org.apache.hcatalog.rcfile 
+ +

+

+

Overview

+ + + +

HCatalog

+
+

HCatalog is a table management and storage management layer for Hadoop that enables users with different data processing tools – Pig, MapReduce, Hive, Streaming – to more easily read and write data on the grid. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, sequence files.

+

(Note: In this release, Streaming is not supported. Also, HCatalog supports only writing RCFile formatted files and only reading PigStorage formated text files.)

+

+ + + + +

HCatalog Architecture

+
+

HCatalog is built on top of the Hive metastore and incorporates components from the Hive DDL. HCatalog provides read and write interfaces for Pig and MapReduce and a command line interface for data definitions.

+

(Note: HCatalog notification is not available in this release.)

+

+ +

Interfaces

+

The HCatalog interface for Pig – HCatLoader and HCatStorer – is an implementation of the Pig load and store interfaces. HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement. HCatStorer accepts a table to write to and a specification of partition keys to create a new partition. Currently HCatStorer only supports writing to one partition. HCatLoader and HCatStorer are implemented on top of HCatInputFormat and HCatOutputFormat respectively

+

The HCatalog interface for MapReduce – HCatInputFormat and HCatOutputFormat – is an implementation of Hadoop InputFormat and OutputFormat. HCatInputFormat accepts a table to read data from and a selection predicate to indicate which partitions to scan. HCatOutputFormat accepts a table to write to and a specification of partition keys to create a new partition. Currently HCatOutputFormat only supports writing to one partition.

+

+Note: Currently there is no Hive-specific interface. Since HCatalog uses Hive's metastore, Hive can read data in HCatalog directly as long as a SerDe for that data already exists. In the future we plan to write a HCatalogSerDe so that users won't need storage-specific SerDes and so that Hive users can write data to HCatalog. Currently, this is supported - if a Hive user writes data in the RCFile format, it is possible to read the data through HCatalog.

+

Data is defined using HCatalog's command line interface (CLI). The HCatalog CLI supports most of the DDL portion of Hive's query language, allowing users to create, alter, drop tables, etc. The CLI also supports the data exploration part of the Hive command line, such as SHOW TABLES, DESCRIBE TABLE, etc.

+ +

Data Model

+

HCatalog presents a relational view of data in HDFS. Data is stored in tables and these tables can be placed in databases. Tables can also be hash partitioned on one or more keys; that is, for a given value of a key (or set of keys) there will be one partition that contains all rows with that value (or set of values). For example, if a table is partitioned on date and there are three days of data in the table, there will be three partitions in the table. New partitions can be added to a table, and partitions can be dropped from a table. Partitioned tables have no partitions at create time. Unpartitioned tables effectively have one default partition that must be created at table creation time. There is no guaranteed read consistency when a partition is dropped.

+

Partitions contain records. Once a partition is created records cannot be added to it, removed from it, or updated in it. (In the future some ability to integrate changes to a partition will be added.) Partitions are multi-dimensional and not hierarchical. Records are divided into columns. Columns have a name and a datatype. HCatalog supports the same datatypes as Hive.

+
+ + + +

Data Flow Example

+
+

This simple data flow example shows how HCatalog is used to move data from the grid into a database. + From the database, the data can then be analyzed using Hive.

+

+First Joe in data acquisition uses distcp to get data onto the grid.

+
+hadoop distcp file:///file.dat hdfs://data/rawevents/20100819/data
+
+hcat "alter table rawevents add partition 20100819 hdfs://data/rawevents/20100819/data"
+
+

+Second Sally in data processing uses Pig to cleanse and prepare the data.

+

Without HCatalog, Sally must be manually informed by Joe that data is available, or use Oozie and poll on HDFS.

+
+A = load '/data/rawevents/20100819/data' as (alpha:int, beta:chararray, …);
+B = filter A by bot_finder(zeta) = 0;
+…
+store Z into 'data/processedevents/20100819/data';
+
+

With HCatalog, Oozie will be notified by HCatalog data is available and can then start the Pig job

+
+A = load 'rawevents' using HCatLoader;
+B = filter A by date = '20100819' and by bot_finder(zeta) = 0;
+…
+store Z into 'processedevents' using HCatStorer("date=20100819");
+
+

+Third Robert in client management uses Hive to analyze his clients' results.

+

Without HCatalog, Robert must alter the table to add the required partition.

+
+alter table processedevents add partition 20100819 hdfs://data/processedevents/20100819/data
+
+select advertiser_id, count(clicks)
+from processedevents
+where date = '20100819' 
+group by adverstiser_id;
+
+

With HCatalog, Robert does not need to modify the table structure.

+
+select advertiser_id, count(clicks)
+from processedevents
+where date = ‘20100819’ 
+group by adverstiser_id;
+
+
+ + +
+

+ +

+


+ + + + + + + + + + + + + + + +
+ +
+ + + +
+ + + Added: incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-tree.html URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-tree.html?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-tree.html (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/api/overview-tree.html Sun Oct 2 21:05:22 2011 @@ -0,0 +1,244 @@ + + + + + + +Class Hierarchy (HCatalog 0.2.0-incubating API) + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ +
+ + + +
+
+

+Hierarchy For All Packages

+
+
+
Package Hierarchies:
org.apache.hcatalog.cli, org.apache.hcatalog.cli.SemanticAnalysis, org.apache.hcatalog.common, org.apache.hcatalog.data, org.apache.hcatalog.data.schema, org.apache.hcatalog.har, org.apache.hcatalog.listener, org.apache.hcatalog.mapreduce, org.apache.hcatalog.oozie, org.apache.hcatalog.pig, org.apache.hcatalog.pig.drivers, org.apache.hcatalog.rcfile
+
+

+Class Hierarchy +

+ +

+Interface Hierarchy +

+
    +
  • java.lang.Comparable<T>
      +
    • org.apache.hcatalog.data.HCatRecordable
    • org.apache.hadoop.io.WritableComparable<T> (also extends org.apache.hadoop.io.Writable) + +
    +
  • org.apache.hadoop.io.Writable
      +
    • org.apache.hcatalog.data.HCatRecordable
    • org.apache.hadoop.io.WritableComparable<T> (also extends java.lang.Comparable<T>) + +
    +
+

+Enum Hierarchy +

+ +
+ + + + + + + + + + + + + + + +
+ +
+ + + +
+ + + Added: incubator/hcatalog/site/publish/docs/r0.2.0/api/package-list URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/api/package-list?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/api/package-list (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/api/package-list Sun Oct 2 21:05:22 2011 @@ -0,0 +1,12 @@ +org.apache.hcatalog.cli +org.apache.hcatalog.cli.SemanticAnalysis +org.apache.hcatalog.common +org.apache.hcatalog.data +org.apache.hcatalog.data.schema +org.apache.hcatalog.har +org.apache.hcatalog.listener +org.apache.hcatalog.mapreduce +org.apache.hcatalog.oozie +org.apache.hcatalog.pig +org.apache.hcatalog.pig.drivers +org.apache.hcatalog.rcfile Added: incubator/hcatalog/site/publish/docs/r0.2.0/api/resources/inherit.gif URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/api/resources/inherit.gif?rev=1178252&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/hcatalog/site/publish/docs/r0.2.0/api/resources/inherit.gif ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: incubator/hcatalog/site/publish/docs/r0.2.0/api/serialized-form.html URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/api/serialized-form.html?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/api/serialized-form.html (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/api/serialized-form.html Sun Oct 2 21:05:22 2011 @@ -0,0 +1,683 @@ + + + + + + +Serialized Form (HCatalog 0.2.0-incubating API) + + + + + + + + + + + + +
+ + + + + + + + + + + + + + + +
+ +
+ + + +
+
+

+Serialized Form

+
+
+ + + + + +
+Package org.apache.hcatalog.common
+ +

+ + + + + +
+Class org.apache.hcatalog.common.HCatException extends java.io.IOException implements Serializable
+ +

+serialVersionUID: 1L + +

+ + + + + +
+Serialized Fields
+ +

+errorType

+
+ErrorType errorType
+
+
The error type enum for this exception. +

+

+
+
+
+ + + + + +
+Package org.apache.hcatalog.data
+ +

+ + + + + +
+Class org.apache.hcatalog.data.HCatArrayBag extends java.lang.Object implements Serializable
+ +

+ + + + + +
+Serialized Fields
+ +

+rawItemList

+
+java.util.List<E> rawItemList
+
+
+
+
+
+

+convertedBag

+
+org.apache.pig.data.DataBag convertedBag
+
+
+
+
+ +

+ + + + + +
+Class org.apache.hcatalog.data.Pair extends java.lang.Object implements Serializable
+ +

+serialVersionUID: 1L + +

+ + + + + +
+Serialized Fields
+ +

+first

+
+java.lang.Object first
+
+
+
+
+
+

+second

+
+java.lang.Object second
+
+
+
+
+
+ + + + + +
+Package org.apache.hcatalog.data.schema
+ +

+ + + + + +
+Class org.apache.hcatalog.data.schema.HCatFieldSchema extends java.lang.Object implements Serializable
+ +

+serialVersionUID: 1L + +

+ + + + + +
+Serialized Fields
+ +

+fieldName

+
+java.lang.String fieldName
+
+
+
+
+
+

+comment

+
+java.lang.String comment
+
+
+
+
+
+

+type

+
+HCatFieldSchema.Type type
+
+
+
+
+
+

+category

+
+HCatFieldSchema.Category category
+
+
+
+
+
+

+subSchema

+
+HCatSchema subSchema
+
+
+
+
+
+

+mapKeyType

+
+HCatFieldSchema.Type mapKeyType
+
+
+
+
+
+

+typeString

+
+java.lang.String typeString
+
+
+
+
+ +

+ + + + + +
+Class org.apache.hcatalog.data.schema.HCatSchema extends java.lang.Object implements Serializable
+ +

+serialVersionUID: 1L + +

+ + + + + +
+Serialized Fields
+ +

+fieldSchemas

+
+java.util.List<E> fieldSchemas
+
+
+
+
+
+

+fieldPositionMap

+
+java.util.Map<K,V> fieldPositionMap
+
+
+
+
+
+

+fieldNames

+
+java.util.List<E> fieldNames
+
+
+
+
+
+ + + + + +
+Package org.apache.hcatalog.mapreduce
+ +

+ + + + + +
+Class org.apache.hcatalog.mapreduce.HCatTableInfo extends java.lang.Object implements Serializable
+ +

+serialVersionUID: 1L + +

+ + + + + +
+Serialized Fields
+ +

+tableInfoType

+
+HCatTableInfo.TableInfoType tableInfoType
+
+
+
+
+
+

+serverUri

+
+java.lang.String serverUri
+
+
The Metadata server uri +

+

+
+
+
+

+serverKerberosPrincipal

+
+java.lang.String serverKerberosPrincipal
+
+
If the hcat server is configured to work with hadoop security, this + variable will hold the principal name of the server - this will be used + in the authentication to the hcat server using kerberos +

+

+
+
+
+

+dbName

+
+java.lang.String dbName
+
+
The db and table names +

+

+
+
+
+

+tableName

+
+java.lang.String tableName
+
+
+
+
+
+

+filter

+
+java.lang.String filter
+
+
The partition filter +

+

+
+
+
+

+partitionPredicates

+
+java.lang.String partitionPredicates
+
+
The partition predicates to filter on, an arbitrary AND/OR filter, if used to input from +

+

+
+
+
+

+jobInfo

+
+JobInfo jobInfo
+
+
The information about the partitions matching the specified query +

+

+
+
+
+

+partitionValues

+
+java.util.Map<K,V> partitionValues
+
+
The partition values to publish to, if used for output +

+

+
+
+
+

+dynamicPartitioningKeys

+
+java.util.List<E> dynamicPartitioningKeys
+
+
List of keys for which values were not specified at write setup time, to be infered at write time +

+

+
+
+ +

+ + + + + +
+Class org.apache.hcatalog.mapreduce.JobInfo extends java.lang.Object implements Serializable
+ +

+serialVersionUID: 1L + +

+ + + + + +
+Serialized Fields
+ +

+dbName

+
+java.lang.String dbName
+
+
The db and table names. +

+

+
+
+
+

+tableName

+
+java.lang.String tableName
+
+
+
+
+
+

+tableSchema

+
+HCatSchema tableSchema
+
+
The table schema. +

+

+
+
+
+

+partitions

+
+java.util.List<E> partitions
+
+
The list of partitions matching the filter. +

+

+
+
+ +

+ + + + + +
+Class org.apache.hcatalog.mapreduce.PartInfo extends java.lang.Object implements Serializable
+ +

+serialVersionUID: 1L + +

+ + + + + +
+Serialized Fields
+ +

+partitionSchema

+
+HCatSchema partitionSchema
+
+
The partition schema. +

+

+
+
+
+

+inputStorageDriverClass

+
+java.lang.String inputStorageDriverClass
+
+
The information about which input storage driver to use +

+

+
+
+
+

+hcatProperties

+
+java.util.Properties hcatProperties
+
+
HCat-specific properties set at the partition +

+

+
+
+
+

+location

+
+java.lang.String location
+
+
The data location. +

+

+
+
+
+

+partitionValues

+
+java.util.Map<K,V> partitionValues
+
+
The map of partition key names and their values. +

+

+
+
+ +

+


+ + + + + + + + + + + + + + + +
+ +
+ + + +
+ + + Added: incubator/hcatalog/site/publish/docs/r0.2.0/api/stylesheet.css URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/api/stylesheet.css?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/api/stylesheet.css (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/api/stylesheet.css Sun Oct 2 21:05:22 2011 @@ -0,0 +1,29 @@ +/* Javadoc style sheet */ + +/* Define colors, fonts and other style attributes here to override the defaults */ + +/* Page background color */ +body { background-color: #FFFFFF; color:#000000 } + +/* Headings */ +h1 { font-size: 145% } + +/* Table colors */ +.TableHeadingColor { background: #CCCCFF; color:#000000 } /* Dark mauve */ +.TableSubHeadingColor { background: #EEEEFF; color:#000000 } /* Light mauve */ +.TableRowColor { background: #FFFFFF; color:#000000 } /* White */ + +/* Font used in left-hand frame lists */ +.FrameTitleFont { font-size: 100%; font-family: Helvetica, Arial, sans-serif; color:#000000 } +.FrameHeadingFont { font-size: 90%; font-family: Helvetica, Arial, sans-serif; color:#000000 } +.FrameItemFont { font-size: 90%; font-family: Helvetica, Arial, sans-serif; color:#000000 } + +/* Navigation bar fonts and colors */ +.NavBarCell1 { background-color:#EEEEFF; color:#000000} /* Light mauve */ +.NavBarCell1Rev { background-color:#00008B; color:#FFFFFF} /* Dark Blue */ +.NavBarFont1 { font-family: Arial, Helvetica, sans-serif; color:#000000;color:#000000;} +.NavBarFont1Rev { font-family: Arial, Helvetica, sans-serif; color:#FFFFFF;color:#FFFFFF;} + +.NavBarCell2 { font-family: Arial, Helvetica, sans-serif; background-color:#FFFFFF; color:#000000} +.NavBarCell3 { font-family: Arial, Helvetica, sans-serif; background-color:#FFFFFF; color:#000000} + Added: incubator/hcatalog/site/publish/docs/r0.2.0/broken-links.xml URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/broken-links.xml?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/broken-links.xml (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/broken-links.xml Sun Oct 2 21:05:22 2011 @@ -0,0 +1,2 @@ + + Added: incubator/hcatalog/site/publish/docs/r0.2.0/cli.html URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/cli.html?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/cli.html (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/cli.html Sun Oct 2 21:05:22 2011 @@ -0,0 +1,434 @@ + + + + + + + +Command Line Interface + + + + + + + + + +
+ +
+ +
+ +
+ + + + +
+HCatalog +
+ + + + +
+
+
+
+ +
+ + +
+ +
+ +   +
+ + + + + +
+ +

Command Line Interface

+ + + + + +

Set Up

+
+

The HCatalog command line interface (CLI) can be invoked as hcat.

+

+Authentication +

+ + + + + + + + +
+

If a failure results in a message like "2010-11-03 16:17:28,225 WARN hive.metastore ... - Unable to connect metastore with URI thrift://..." in /tmp/<username>/hive.log, then make sure you have run "kinit <username>@FOO.COM" to get a kerberos ticket and to be able to authenticate to the HCatalog server.

+
+

If other errors occur while using the HCatalog CLI, more detailed messages (if any) are written to /tmp/<username>/hive.log.

+
+ + + +

HCatalog CLI

+
+

The HCatalog CLI supports these command line options:

+
    + +
  • +-g: Usage is -g mygroup .... This indicates to HCatalog that table that needs to be created must have group as "mygroup"
  • + +
  • +-p: Usage is -p rwxr-xr-x .... This indicates to HCatalog that table that needs to be created must have permissions as "rwxr-xr-x"
  • + +
  • +-f: Usage is -f myscript.hcatalog .... This indicates to hcatalog that myscript.hcatalog is a file which contains DDL commands it needs to execute.
  • + +
  • +-e: Usage is -e 'create table mytable(a int);' .... This indicates to HCatalog to treat the following string as DDL command and execute it.
  • + +
+

+

Note the following:

+
    + +
  • The -g and -p options are not mandatory. +
  • + +
  • Only one of the -e or -f option can be provided, not both. +
  • + +
  • The order of options is immaterial; you can specify the options in any order. +
  • + +
  • If no option is provided, then a usage message is printed: +
    +Usage: hcat  { -e "<query>" | -f "<filepath>" } [-g "<group>" ] [-p "<perms>"]
    +
    + +
  • + +
+

+

+Assumptions +

+

When using the HCatalog CLI, you cannot specify a permission string without read permissions for owner, such as -wxrwxr-x. If such a permission setting is desired, you can use the octal version instead, which in this case would be 375. Also, any other kind of permission string where the owner has read permissions (for example r-x------ or r--r--r--) will work fine.

+
+ + + + +

HCatalog DDL

+
+

HCatalog supports a subset of the Hive Data Definition Language. For those commands that are supported, any variances are noted below.

+ +

Create/Drop/Alter Table

+

+CREATE TABLE +

+

The STORED AS clause in Hive is:

+
+[STORED AS file_format]
+file_format:
+  : SEQUENCEFILE
+  | TEXTFILE
+  | RCFILE     
+  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
+
+

The STORED AS clause in HCatalog is:

+
+[STORED AS file_format]
+file_format:
+  : RCFILE     
+  | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname 
+                   INPUTDRIVER input_driver_classname OUTPUTDRIVER output_driver_classname
+
+

Note the following:

+
    + +
  • CREATE TABLE command must contain a "STORED AS" clause; if it doesn't it will result in an exception containing message "STORED AS specification is either incomplete or incorrect."
    +
    + + + + + + + + + +
    +

    In this release, HCatalog supports only reading PigStorage formated text files and only writing RCFile formatted files. Therefore, for this release, the command must contain a "STORED AS" clause and either use RCFILE as the file format or specify org.apache.hadoop.hive.ql.io.RCFileInputFormat and org.apache.hadoop.hive.ql.io.RCFileOutputFormat as INPUTFORMAT and OUTPUTFORMAT respectively.

    +
    + +
    + +
  • + +
  • For partitioned tables, partition columns can only be of type String. +
  • + +
  • CLUSTERED BY clause is not supported. If provided error message will contain "Operation not supported. HCatalog doesn't allow Clustered By in create table." +
  • + +
+

+

+CREATE TABLE AS SELECT +

+

Not supported. Throws an exception with message "Operation Not Supported".

+

+CREATE TABLE LIKE +

+

Not supported. Throws an exception with message "Operation Not Supported".

+

+DROP TABLE +

+

Supported. Behavior the same as Hive.

+

+ALTER TABLE +

+
+ALTER TABLE table_name ADD partition_spec [ LOCATION 'location1' ] partition_spec [ LOCATION 'location2' ] ...
+ partition_spec:
+  : PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value, ...)
+
+

Note the following:

+
    + +
  • Allowed only if TABLE table_name was created using HCatalog. Else, throws an exception containing error message "Operation not supported. Partitions can be added only in a table created through HCatalog. It seems table tablename was not created through HCatalog" +
  • + +
+

+

+ALTER TABLE FILE FORMAT +

+
+ALTER TABLE table_name SET FILEFORMAT file_format 
+
+

Note the following:

+
    + +
  • Here file_format must be same as the one described above in CREATE TABLE. Else, throw an exception "Operation not supported. Not a valid file format."
  • + +
  • CLUSTERED BY clause is not supported. If provided will result in an exception "Operation not supported."
  • + +
+

+ALTER TABLE Change Column Name/Type/Position/Comment +

+
+ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name]
+
+

Not supported. Throws an exception with message "Operation Not Supported".

+

+ALTER TABLE Add/Replace Columns +

+
+ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...)
+
+

Note the following:

+
    + +
  • ADD Columns is allowed. Behavior same as of Hive.
  • + +
  • Replace column is not supported. Throws an exception with message "Operation Not Supported".
  • + +
+

+ALTER TABLE TOUCH +

+
+ALTER TABLE table_name TOUCH;
+ALTER TABLE table_name TOUCH PARTITION partition_spec;
+
+

Not supported. Throws an exception with message "Operation Not Supported".

+ +

Create/Drop/Alter View

+

+CREATE VIEW +

+

Not supported. Throws an exception with message "Operation Not Supported".

+

+DROP VIEW +

+

Not supported. Throws an exception with message "Operation Not Supported".

+

+ALTER VIEW +

+

Not supported. Throws an exception with message "Operation Not Supported".

+ +

Show/Describe

+

+SHOW TABLES +

+

Supported. Behavior same as Hive.

+

+SHOW PARTITIONS +

+

Not supported. Throws an exception with message "Operation Not Supported".

+

+SHOW FUNCTIONS +

+

Supported. Behavior same as Hive.

+

+DESCRIBE +

+

Supported. Behavior same as Hive.

+ +

Other Commands

+

Any command not listed above is NOT supported and throws an exception with message "Operation Not Supported".

+
+ + + + +
+ +
 
+
+ + + Added: incubator/hcatalog/site/publish/docs/r0.2.0/cli.pdf URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/cli.pdf?rev=1178252&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/hcatalog/site/publish/docs/r0.2.0/cli.pdf ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: incubator/hcatalog/site/publish/docs/r0.2.0/dynpartition.html URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/dynpartition.html?rev=1178252&view=auto ============================================================================== --- incubator/hcatalog/site/publish/docs/r0.2.0/dynpartition.html (added) +++ incubator/hcatalog/site/publish/docs/r0.2.0/dynpartition.html Sun Oct 2 21:05:22 2011 @@ -0,0 +1,285 @@ + + + + + + + +Dynamic Partitioning + + + + + + + + + +
+ +
+ +
+ +
+ + + + +
+HCatalog +
+ + + + +
+
+
+
+ +
+ + +
+ +
+ +   +
+ + + + + +
+ +

Dynamic Partitioning

+ + + + + +

Overview

+
+

In earlier versions of HCatalog, to read data users could specify that they were interested in reading from the table and specify various partition key/value combinations to prune, as if specifying a SQL-like where clause. However, to write data the abstraction was not as seamless. We still required users to write out data to the table, partition-by-partition, but these partitions required fine-grained knowledge of which key/value pairs they needed. We required this knowledge in advance, and we required the user to have already grouped the requisite data accordingly before attempting to store.

+

The following Pig script illustrates this:

+
+A = load 'raw' using HCatLoader(); 
+... 
+split Z into for_us if region='us', for_eu if region='eu', for_asia if region='asia'; 
+store for_us into 'processed' using HCatStorer("ds=20110110, region=us"); 
+store for_eu into 'processed' using HCatStorer("ds=20110110, region=eu"); 
+store for_asia into 'processed' using HCatStorer("ds=20110110, region=asia"); 
+
+

+

This approach had a major issue. MapReduce programs and Pig scripts needed to be aware of all the possible values of a key, and these values needed to be maintained and/or modified when new values were introduced. With more partitions, scripts began to look cumbersome. And if each partition being written launched a separate HCatalog store, we were increasing the load on the HCatalog server and launching more jobs for the store by a factor of the number of partitions.

+

A better approach is to have HCatalog determine all the partitions required from the data being written. This would allow us to simplify the above script into the following:

+
+A = load 'raw' using HCatLoader(); 
+... 
+store Z into 'processed' using HCatStorer("ds=20110110"); 
+
+

The way dynamic partitioning works is that HCatalog locates partition columns in the data passed to it and uses the data in these columns to split the rows across multiple partitions. (The data passed to HCatalog must have a schema that matches the schema of the destination table and hence should always contain partition columns.) It is important to note that partition columns can’t contain null values or the whole process will fail. It is also important note that all partitions created during a single run are part of a transaction and if any part of the process fails none of the partitions will be added to the table.

+
+ + + + +

Usage with Pig

+
+

Usage from Pig is very simple! Instead of specifying all keys as one normally does for a store, users can specify the keys that are actually needed. HCatOutputFormat will trigger on dynamic partitioning usage if necessary (if a key value is not specified) and will inspect the data to write it out appropriately.

+

So this statement...

+
+store A into 'mytable' using HCatStorer("a=1, b=1");
+
+

...is equivalent to any of the following statements, if the data has only values where a=1 and b=1:

+
+store A into 'mytable' using HCatStorer();
+
+
+store A into 'mytable' using HCatStorer("a=1");
+
+
+store A into 'mytable' using HCatStorer("b=1");
+
+

On the other hand, if there is data that spans more than one partition, then HCatOutputFormat will automatically figure out how to spray the data appropriately.

+

For example, let's say a=1 for all values across our dataset and b takes the value 1 and 2. Then the following statement...

+
+store A into 'mytable' using HCatStorer();
+
+

...is equivalent to either of these statements:

+
+store A into 'mytable' using HCatStorer("a=1");
+
+
+split A into A1 if b='1', A2 if b='2';
+store A1 into 'mytable' using HCatStorer("a=1, b=1");
+store A2 into 'mytable' using HCatStorer("a=1, b=2");
+
+
+ + + + +

Usage from MapReduce

+
+

As with Pig, the only change in dynamic partitioning that a MapReduce programmer sees is that they don't have to specify all the partition key/value combinations.

+

A current code example for writing out a specific partition for (a=1,b=1) would go something like this:

+
+Map<String, String> partitionValues = new HashMap<String, String>();
+partitionValues.put("a", "1");
+partitionValues.put("b", "1");
+HCatTableInfo info = HCatTableInfo.getOutputTableInfo(
+    serverUri, serverKerberosPrincipal, dbName, tblName, partitionValues);
+HCatOutputFormat.setOutput(job, info);
+
+

And to write to multiple partitions, separate jobs will have to be kicked off with each of the above.

+

With dynamic partition, we simply specify only as many keys as we know about, or as required. It will figure out the rest of the keys by itself and spray out necessary partitions, being able to create multiple partitions with a single job.

+
+ + + + +

Compaction

+
+

Dynamic partitioning potentially results in a large number of files and more namenode load. To address this issue, we utilize HAR to archive partitions after writing out as part of the HCatOutputCommitter action. Compaction is disabled by default. To enable compaction, use the Hive parameter hive.archive.enabled, specified in the client side hive-site.xml. The current behavior of compaction is to fail the entire job if compaction fails.

+
+ + + + +

References

+ + + +
+ +
 
+
+ + + Added: incubator/hcatalog/site/publish/docs/r0.2.0/dynpartition.pdf URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/dynpartition.pdf?rev=1178252&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/hcatalog/site/publish/docs/r0.2.0/dynpartition.pdf ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream Added: incubator/hcatalog/site/publish/docs/r0.2.0/images/built-with-forrest-button.png URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/images/built-with-forrest-button.png?rev=1178252&view=auto ============================================================================== Binary file - no diff available. Propchange: incubator/hcatalog/site/publish/docs/r0.2.0/images/built-with-forrest-button.png ------------------------------------------------------------------------------ svn:mime-type = application/octet-stream