From hcatalog-commits-return-7-apmail-incubator-hcatalog-commits-archive=incubator.apache.org@incubator.apache.org Tue Apr 12 17:30:57 2011 Return-Path: Delivered-To: apmail-incubator-hcatalog-commits-archive@minotaur.apache.org Received: (qmail 84205 invoked from network); 12 Apr 2011 17:30:57 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3) by minotaur.apache.org with SMTP; 12 Apr 2011 17:30:57 -0000 Received: (qmail 45994 invoked by uid 500); 12 Apr 2011 17:30:56 -0000 Delivered-To: apmail-incubator-hcatalog-commits-archive@incubator.apache.org Received: (qmail 45959 invoked by uid 500); 12 Apr 2011 17:30:56 -0000 Mailing-List: contact hcatalog-commits-help@incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: hcatalog-dev@incubator.apache.org Delivered-To: mailing list hcatalog-commits@incubator.apache.org Received: (qmail 45919 invoked by uid 99); 12 Apr 2011 17:30:56 -0000 Received: from nike.apache.org (HELO nike.apache.org) (192.87.106.230) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 17:30:56 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,NORMAL_HTTP_TO_IP X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO eris.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 12 Apr 2011 17:30:36 +0000 Received: by eris.apache.org (Postfix, from userid 65534) id 8CE4323889B1; Tue, 12 Apr 2011 17:30:13 +0000 (UTC) Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Subject: svn commit: r1091509 [1/8] - in /incubator/hcatalog/trunk: ./ bin/ ivy/ src/ src/docs/ src/docs/src/ src/docs/src/documentation/ src/docs/src/documentation/classes/ src/docs/src/documentation/conf/ src/docs/src/documentation/content/ src/docs/src/docum... Date: Tue, 12 Apr 2011 17:30:12 -0000 To: hcatalog-commits@incubator.apache.org From: gates@apache.org X-Mailer: svnmailer-1.0.8 Message-Id: <20110412173013.8CE4323889B1@eris.apache.org> X-Virus-Checked: Checked by ClamAV on apache.org Author: gates Date: Tue Apr 12 17:30:08 2011 New Revision: 1091509 URL: http://svn.apache.org/viewvc?rev=1091509&view=rev Log: Initial checking of HCatalog code. Added: incubator/hcatalog/trunk/bin/ incubator/hcatalog/trunk/bin/hcat.sh incubator/hcatalog/trunk/build.xml incubator/hcatalog/trunk/ivy/ incubator/hcatalog/trunk/ivy.xml incubator/hcatalog/trunk/ivy/libraries.properties incubator/hcatalog/trunk/src/ incubator/hcatalog/trunk/src/docs/ incubator/hcatalog/trunk/src/docs/forrest.properties incubator/hcatalog/trunk/src/docs/forrest.properties.dispatcher.properties incubator/hcatalog/trunk/src/docs/forrest.properties.xml incubator/hcatalog/trunk/src/docs/src/ incubator/hcatalog/trunk/src/docs/src/documentation/ incubator/hcatalog/trunk/src/docs/src/documentation/README.txt incubator/hcatalog/trunk/src/docs/src/documentation/classes/ incubator/hcatalog/trunk/src/docs/src/documentation/classes/CatalogManager.properties incubator/hcatalog/trunk/src/docs/src/documentation/conf/ incubator/hcatalog/trunk/src/docs/src/documentation/conf/cli.xconf incubator/hcatalog/trunk/src/docs/src/documentation/content/ incubator/hcatalog/trunk/src/docs/src/documentation/content/locationmap.xml incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/cli.xml incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/inputoutput.xml incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/loadstore.xml incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/site.xml incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/supportedformats.xml incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/tabs.xml incubator/hcatalog/trunk/src/docs/src/documentation/resources/ incubator/hcatalog/trunk/src/docs/src/documentation/resources/images/ incubator/hcatalog/trunk/src/docs/src/documentation/resources/images/ellipse-2.svg incubator/hcatalog/trunk/src/docs/src/documentation/sitemap.xmap incubator/hcatalog/trunk/src/docs/src/documentation/skinconf.xml incubator/hcatalog/trunk/src/java/ incubator/hcatalog/trunk/src/java/org/ incubator/hcatalog/trunk/src/java/org/apache/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/HCatCli.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/HCatDriver.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/SemanticAnalysis/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/SemanticAnalysis/AddPartitionHook.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/SemanticAnalysis/AlterTableFileFormatHook.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/SemanticAnalysis/CreateDatabaseHook.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/SemanticAnalysis/CreateTableHook.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/cli/SemanticAnalysis/HCatSemanticAnalyzer.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/common/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/common/AuthUtils.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/common/ErrorType.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/common/HCatConstants.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/common/HCatException.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/common/HCatUtil.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/DataType.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/DefaultHCatRecord.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/HCatArrayBag.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/HCatRecord.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/HCatRecordable.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/Pair.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/ReaderWriter.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/schema/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/schema/HCatFieldSchema.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/schema/HCatSchema.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/schema/HCatSchemaUtils.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputStorageDriver.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputCommitter.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputStorageDriver.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatRecordReader.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatRecordWriter.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatSplit.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatTableInfo.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/InitializeInput.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/JobInfo.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/OutputJobInfo.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/oozie/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/oozie/JavaAction.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/pig/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/pig/HCatLoader.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/pig/HCatStorer.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/pig/PigHCatUtil.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/pig/drivers/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/pig/drivers/LoadFuncBasedInputDriver.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/pig/drivers/LoadFuncBasedInputFormat.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/pig/drivers/PigStorageInputDriver.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/rcfile/ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/rcfile/RCFileInputDriver.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/rcfile/RCFileMapReduceInputFormat.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/rcfile/RCFileMapReduceOutputFormat.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/rcfile/RCFileMapReduceRecordReader.java incubator/hcatalog/trunk/src/java/org/apache/hcatalog/rcfile/RCFileOutputDriver.java incubator/hcatalog/trunk/src/test/ incubator/hcatalog/trunk/src/test/org/ incubator/hcatalog/trunk/src/test/org/apache/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/ExitException.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/MiniCluster.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/NoExitSecurityManager.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/cli/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/cli/TestPermsGrp.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/cli/TestSemanticAnalysis.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/cli/TestUseDatabase.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/common/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/common/TestHCatUtil.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/data/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/data/TestDefaultHCatRecord.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/data/schema/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/data/schema/TestHCatSchemaUtils.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/mapreduce/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/mapreduce/HCatMapReduceTest.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatHiveCompatibility.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatNonPartitioned.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatOutputFormat.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/mapreduce/TestHCatPartitioned.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/pig/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/pig/MyPigStorageDriver.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/pig/TestHCatLoader.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/pig/TestHCatStorer.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/pig/TestHCatStorerMulti.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/pig/TestPermsInheritance.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/pig/TestPigStorageDriver.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/rcfile/ incubator/hcatalog/trunk/src/test/org/apache/hcatalog/rcfile/TestRCFileInputStorageDriver.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/rcfile/TestRCFileMapReduceInputFormat.java incubator/hcatalog/trunk/src/test/org/apache/hcatalog/rcfile/TestRCFileOutputStorageDriver.java Added: incubator/hcatalog/trunk/bin/hcat.sh URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/bin/hcat.sh?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/bin/hcat.sh (added) +++ incubator/hcatalog/trunk/bin/hcat.sh Tue Apr 12 17:30:08 2011 @@ -0,0 +1,38 @@ +#!/usr/bin/env bash + +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +HOWL_DIR=`dirname "$0"` + +HOWL_JAR_LOC=`find . -name "hcatalog*.jar"` + +HADOOP_CLASSPATH=$HADOOP_CLASSPATH:${HOWL_JAR_LOC}:../lib/commons-cli-2.0-SNAPSHOT.jar:../build/cli/hive-cli-0.7.0.jar:../ql/lib/antlr-runtime-3.0.1.jar + +export HADOOP_CLASSPATH=$HADOOP_CLASSPATH + +for f in `ls ../build/dist/lib/*.jar`; do + HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f +done + +HADOOP_OPTS="$HADOOP_OPTS -Dhive.metastore.uris=thrift://localhost:9083 " + +export HADOOP_OPTS=$HADOOP_OPTS + +exec $HADOOP_HOME/bin/hadoop jar ${HOWL_JAR_LOC} org.apache.hcatalog.cli.HCatCli "$@" + +# Above is the recommended way to launch hcatalog cli. If it doesnt work, you can try the following: +# java -Dhive.metastore.uris=thrift://localhost:9083 -cp ../lib/commons-logging-1.0.4.jar:../build/hadoopcore/hadoop-0.20.0/hadoop-0.20.0-core.jar:../lib/commons-cli-2.0-SNAPSHOT.jar:../build/cli/hive-cli-0.7.0.jar:../ql/lib/antlr-runtime-3.0.1.jar:$HOWL_JAR org.apache.hcatalog.cli.HCatCli "$@" Added: incubator/hcatalog/trunk/build.xml URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/build.xml?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/build.xml (added) +++ incubator/hcatalog/trunk/build.xml Tue Apr 12 17:30:08 2011 @@ -0,0 +1,78 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Added: incubator/hcatalog/trunk/ivy.xml URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/ivy.xml?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/ivy.xml (added) +++ incubator/hcatalog/trunk/ivy.xml Tue Apr 12 17:30:08 2011 @@ -0,0 +1,39 @@ + + + + + + + + Apache Hadoop Howl + + + + + + + + + + + + + + + Added: incubator/hcatalog/trunk/ivy/libraries.properties URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/ivy/libraries.properties?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/ivy/libraries.properties (added) +++ incubator/hcatalog/trunk/ivy/libraries.properties Tue Apr 12 17:30:08 2011 @@ -0,0 +1,17 @@ +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +#This properties file lists the versions of the various artifacts used by hadoop and components. +#It drives ivy and the generation of a maven POM + +junit.version=3.8.1 + Added: incubator/hcatalog/trunk/src/docs/forrest.properties URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/forrest.properties?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/forrest.properties (added) +++ incubator/hcatalog/trunk/src/docs/forrest.properties Tue Apr 12 17:30:08 2011 @@ -0,0 +1,166 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +############## +# These are the defaults, un-comment them only if you need to change them. +# +# You can even have a completely empty file, to assist with maintenance. +# This file is required, even if empty. +# +# The file obtained from 'forrest seed-sample' shows the defaults. +############## + +# Prints out a summary of Forrest settings for this project +#forrest.echo=true + +# Project name (used to name .war file) +#project.name=my-project + +# Specifies name of Forrest skin to use +# See list at http://forrest.apache.org/docs/skins.html +#project.skin=pelt + +# codename: Dispatcher +# Dispatcher is using a fallback mechanism for theming. +# You can configure the theme name and its extension here +#project.theme-extension=.fv +#project.theme=pelt + + +# Descriptors for plugins and skins +# comma separated list, file:// is supported +#forrest.skins.descriptors=http://forrest.apache.org/skins/skins.xml,file:///c:/myskins/skins.xml +#forrest.plugins.descriptors=http://forrest.apache.org/plugins/plugins.xml,http://forrest.apache.org/plugins/whiteboard-plugins.xml + +############## +# behavioural properties +#project.menu-scheme=tab_attributes +#project.menu-scheme=directories + +############## +# layout properties + +# Properties that can be set to override the default locations +# +# Parent properties must be set. This usually means uncommenting +# project.content-dir if any other property using it is uncommented + +#project.status=status.xml +#project.content-dir=src/documentation +#project.raw-content-dir=${project.content-dir}/content +#project.conf-dir=${project.content-dir}/conf +#project.sitemap-dir=${project.content-dir} +#project.xdocs-dir=${project.content-dir}/content/xdocs +#project.resources-dir=${project.content-dir}/resources +#project.stylesheets-dir=${project.resources-dir}/stylesheets +#project.images-dir=${project.resources-dir}/images +#project.schema-dir=${project.resources-dir}/schema +#project.skins-dir=${project.content-dir}/skins +#project.skinconf=${project.content-dir}/skinconf.xml +#project.lib-dir=${project.content-dir}/lib +#project.classes-dir=${project.content-dir}/classes +#project.translations-dir=${project.content-dir}/translations + +#project.build-dir=${project.home}/build +#project.site=site +#project.site-dir=${project.build-dir}/${project.site} +#project.temp-dir=${project.build-dir}/tmp + +############## +# Cocoon catalog entity resolver properties +# A local OASIS catalog file to supplement the default Forrest catalog +#project.catalog=${project.schema-dir}/catalog.xcat + +############## +# validation properties + +# This set of properties determine if validation is performed +# Values are inherited unless overridden. +# e.g. if forrest.validate=false then all others are false unless set to true. +#forrest.validate=true +#forrest.validate.xdocs=${forrest.validate} +#forrest.validate.skinconf=${forrest.validate} + +# PIG-1508: Workaround for http://issues.apache.org/jira/browse/FOR-984 +# Remove when forrest-0.9 is available +forrest.validate.sitemap=false + +#forrest.validate.stylesheets=${forrest.validate} +#forrest.validate.skins=${forrest.validate} +#forrest.validate.skins.stylesheets=${forrest.validate.skins} + +# *.failonerror=(true|false) - stop when an XML file is invalid +#forrest.validate.failonerror=true + +# *.excludes=(pattern) - comma-separated list of path patterns to not validate +# Note: If you do add an "excludes" list then you need to specify site.xml too. +# e.g. +#forrest.validate.xdocs.excludes=site.xml, samples/subdir/**, samples/faq.xml +#forrest.validate.xdocs.excludes=site.xml + + +############## +# General Forrest properties + +# The URL to start crawling from +#project.start-uri=linkmap.html + +# Set logging level for messages printed to the console +# (DEBUG, INFO, WARN, ERROR, FATAL_ERROR) +#project.debuglevel=ERROR + +# Max memory to allocate to Java +#forrest.maxmemory=64m + +# Any other arguments to pass to the JVM. For example, to run on an X-less +# server, set to -Djava.awt.headless=true +#forrest.jvmargs= + +# The bugtracking URL - the issue number will be appended +# Projects would use their own issue tracker, of course. +#project.bugtracking-url=http://issues.apache.org/bugzilla/show_bug.cgi?id= +#project.bugtracking-url=http://issues.apache.org/jira/browse/ + +# The issues list as rss +#project.issues-rss-url= + +#I18n Property. Based on the locale request for the browser. +#If you want to use it for static site then modify the JVM system.language +# and run once per language +#project.i18n=false +project.configfile=${project.home}/src/documentation/conf/cli.xconf + +# The names of plugins that are required to build the project +# comma separated list (no spaces) +# You can request a specific version by appending "-VERSION" to the end of +# the plugin name. If you exclude a version number, the latest released version +# will be used. However, be aware that this may be a development version. In +# a production environment it is recommended that you specify a known working +# version. +# Run "forrest available-plugins" for a list of plug-ins currently available. + +project.required.plugins=org.apache.forrest.plugin.output.pdf,org.apache.forrest.plugin.input.simplifiedDocbook + + +# codename: Dispatcher +# Add the following plugins to project.required.plugins: +#org.apache.forrest.plugin.internal.dispatcher,org.apache.forrest.themes.core,org.apache.forrest.plugin.output.inputModule + +# Proxy configuration +# - proxy.user and proxy.password are only needed if the proxy is an authenticated one... +# proxy.host=myproxy.myhost.com +# proxy.port= +# proxy.user= +# proxy.password= Added: incubator/hcatalog/trunk/src/docs/forrest.properties.dispatcher.properties URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/forrest.properties.dispatcher.properties?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/forrest.properties.dispatcher.properties (added) +++ incubator/hcatalog/trunk/src/docs/forrest.properties.dispatcher.properties Tue Apr 12 17:30:08 2011 @@ -0,0 +1,25 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +############## +# Note: The reason for this "forrest.properties.dispatcher.properties" +# is to assist with automated testing (main/build.sh test). +# Its content redefines the project.required.plugins property which defines +# the list of required plugins for the dispatcher. +# To test the Dispatcher in development, simply replace the +# project.required.plugins property in the forrest.properties file by the one +# defined in this file. +# +project.required.plugins=org.apache.forrest.plugin.output.pdf,org.apache.forrest.plugin.internal.dispatcher,org.apache.forrest.themes.core,org.apache.forrest.plugin.output.inputModule Added: incubator/hcatalog/trunk/src/docs/forrest.properties.xml URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/forrest.properties.xml?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/forrest.properties.xml (added) +++ incubator/hcatalog/trunk/src/docs/forrest.properties.xml Tue Apr 12 17:30:08 2011 @@ -0,0 +1,29 @@ + + + + + + + + + + Added: incubator/hcatalog/trunk/src/docs/src/documentation/README.txt URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/README.txt?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/src/documentation/README.txt (added) +++ incubator/hcatalog/trunk/src/docs/src/documentation/README.txt Tue Apr 12 17:30:08 2011 @@ -0,0 +1,23 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +This is the base documentation directory. + +skinconf.xml # This file customizes Forrest for your project. In it, you + # tell forrest the project name, logo, copyright info, etc + +sitemap.xmap # Optional. This sitemap is consulted before all core sitemaps. + # See http://forrest.apache.org/docs/project-sitemap.html Added: incubator/hcatalog/trunk/src/docs/src/documentation/classes/CatalogManager.properties URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/classes/CatalogManager.properties?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/src/documentation/classes/CatalogManager.properties (added) +++ incubator/hcatalog/trunk/src/docs/src/documentation/classes/CatalogManager.properties Tue Apr 12 17:30:08 2011 @@ -0,0 +1,62 @@ +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor license agreements. See the NOTICE file distributed with +# this work for additional information regarding copyright ownership. +# The ASF licenses this file to You under the Apache License, Version 2.0 +# (the "License"); you may not use this file except in compliance with +# the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +#======================================================================= +# CatalogManager.properties for Catalog Entity Resolver. +# +# This is the default properties file for your project. +# This facilitates local configuration of application-specific catalogs. +# If you have defined any local catalogs, then they will be loaded +# before Forrest's core catalogs. +# +# See the Apache Forrest documentation: +# http://forrest.apache.org/docs/your-project.html +# http://forrest.apache.org/docs/validation.html + +# verbosity: +# The level of messages for status/debug (messages go to standard output). +# The setting here is for your own local catalogs. +# The verbosity of Forrest's core catalogs is controlled via +# main/webapp/WEB-INF/cocoon.xconf +# +# The following messages are provided ... +# 0 = none +# 1 = ? (... not sure yet) +# 2 = 1+, Loading catalog, Resolved public, Resolved system +# 3 = 2+, Catalog does not exist, resolvePublic, resolveSystem +# 10 = 3+, List all catalog entries when loading a catalog +# (Cocoon also logs the "Resolved public" messages.) +verbosity=1 + +# catalogs ... list of additional catalogs to load +# (Note that Apache Forrest will automatically load its own default catalog +# from main/webapp/resources/schema/catalog.xcat) +# Use either full pathnames or relative pathnames. +# pathname separator is always semi-colon (;) regardless of operating system +# directory separator is always slash (/) regardless of operating system +# The project catalog is expected to be at ../resources/schema/catalog.xcat +#catalogs=../resources/schema/catalog.xcat +# FIXME: Workaround FOR-548 "project DTD catalogs are not included +# when running as a servlet WAR". +# Same catalog, different path +catalogs=../resources/schema/catalog.xcat;../../project/src/documentation/resources/schema/catalog.xcat + +# relative-catalogs +# If false, relative catalog URIs are made absolute with respect to the +# base URI of the CatalogManager.properties file. This setting only +# applies to catalog URIs obtained from the catalogs property in the +# CatalogManager.properties file +# Example: relative-catalogs=[yes|no] +relative-catalogs=no Added: incubator/hcatalog/trunk/src/docs/src/documentation/conf/cli.xconf URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/conf/cli.xconf?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/src/documentation/conf/cli.xconf (added) +++ incubator/hcatalog/trunk/src/docs/src/documentation/conf/cli.xconf Tue Apr 12 17:30:08 2011 @@ -0,0 +1,327 @@ + + + + + + + + . + WEB-INF/cocoon.xconf + ../tmp/cocoon-work + ../site + + + + + + + + + + + + + + + index.html + + + + + + + */* + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Added: incubator/hcatalog/trunk/src/docs/src/documentation/content/locationmap.xml URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/locationmap.xml?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/src/documentation/content/locationmap.xml (added) +++ incubator/hcatalog/trunk/src/docs/src/documentation/content/locationmap.xml Tue Apr 12 17:30:08 2011 @@ -0,0 +1,72 @@ + + + + + + + + + + + + + + + + + + + + + + Added: incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/cli.xml URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/cli.xml?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/cli.xml (added) +++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/cli.xml Tue Apr 12 17:30:08 2011 @@ -0,0 +1,219 @@ + + + + + +
+ HCatalog Command Line Interface +
+ + + +
+ Set Up +

The HCatalog command line interface (CLI) can be invoked as hcat.

+ + +

Authentication

+ + + + +

If a failure results in a message like "2010-11-03 16:17:28,225 WARN hive.metastore ... - Unable to connect metastore with URI thrift://..." in /tmp/<username>/hive.log, then make sure you have run "kinit <username>@FOO.COM" to get a kerberos ticket and to be able to authenticate to the HCatalog server.

+

If other errors occur while using the HCatalog CLI, more detailed messages (if any) are written to /tmp/<username>/hive.log.

+
+ +
+HCatalog CLI + +

The HCatalog CLI supports these command line options:

+
    +
  • -g: Usage is -g mygroup .... This indicates to HCatalog that table that needs to be created must have group as "mygroup"
  • +
  • -p: Usage is -p rwxr-xr-x .... This indicates to HCatalog that table that needs to be created must have permissions as "rwxr-xr-x"
  • +
  • -f: Usage is -f myscript.hcatalog .... This indicates to hcatalog that myscript.hcatalog is a file which contains DDL commands it needs to execute.
  • +
  • -e: Usage is -e 'create table mytable(a int);' .... This indicates to HCatalog to treat the following string as DDL command and execute it.
  • +
+

+

Note the following:

+
    +
  • The -g and -p options are not mandatory. +
  • +
  • Only one of the -e or -f option can be provided, not both. +
  • +
  • The order of options is immaterial; you can specify the options in any order. +
  • +
  • If no option is provided, then a usage message is printed: + +Usage: hcat { -e "<query>" | -f "<filepath>" } [-g "<group>" ] [-p "<perms>"] + +
  • +
+

+

Assumptions

+

When using the HCatalog CLI, you cannot specify a permission string without read permissions for owner, such as -wxrwxr-x. If such a permission setting is desired, you can use the octal version instead, which in this case would be 375. Also, any other kind of permission string where the owner has read permissions (for example r-x------ or r--r--r--) will work fine.

+ + + +
+ + +
+ HCatalog DDL + +

HCatalog supports a subset of the Hive Data Definition Language. For those commands that are supported, any variances are noted below.

+ +
+ Create/Drop/Alter Table +

CREATE TABLE

+ +

The STORED AS clause in Hive is:

+ +[STORED AS file_format] +file_format: + : SEQUENCEFILE + | TEXTFILE + | RCFILE + | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname + + +

The STORED AS clause in HCatalog is:

+ +[STORED AS file_format] +file_format: + : RCFILE + | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname + INPUTDRIVER input_driver_classname OUTPUTDRIVER output_driver_classname + + +

Note the following:

+
    +
  • CREATE TABLE command must contain a "STORED AS" clause; if it doesn't it will result in an exception containing message "STORED AS specification is either incomplete or incorrect."



    + + + + +

    In this release, HCatalog supports only reading PigStorage formated text files and only writing RCFile formatted files. Therefore, for this release, the command must contain a "STORED AS" clause and either use RCFILE as the file format or specify org.apache.hadoop.hive.ql.io.RCFileInputFormat and org.apache.hadoop.hive.ql.io.RCFileOutputFormat as INPUTFORMAT and OUTPUTFORMAT respectively.

    +

    +
  • +
  • For partitioned tables, partition columns can only be of type String. +
  • +
  • CLUSTERED BY clause is not supported. If provided error message will contain "Operation not supported. HCatalog doesn't allow Clustered By in create table." +
  • +
+

+

CREATE TABLE AS SELECT

+

Not supported. Throws an exception with message "Operation Not Supported".

+

CREATE TABLE LIKE

+

Not supported. Throws an exception with message "Operation Not Supported".

+

DROP TABLE

+

Supported. Behavior the same as Hive.

+ + + +

ALTER TABLE

+ +ALTER TABLE table_name ADD partition_spec [ LOCATION 'location1' ] partition_spec [ LOCATION 'location2' ] ... + partition_spec: + : PARTITION (partition_col = partition_col_value, partition_col = partiton_col_value, ...) + +

Note the following:

+
    +
  • Allowed only if TABLE table_name was created using HCatalog. Else, throws an exception containing error message "Operation not supported. Partitions can be added only in a table created through HCatalog. It seems table tablename was not created through HCatalog" +
  • +
+

+ +

ALTER TABLE FILE FORMAT

+ +ALTER TABLE table_name SET FILEFORMAT file_format + +

Note the following:

+
    +
  • Here file_format must be same as the one described above in CREATE TABLE. Else, throw an exception "Operation not supported. Not a valid file format."
  • +
  • CLUSTERED BY clause is not supported. If provided will result in an exception "Operation not supported."
  • +
+ + +

ALTER TABLE Change Column Name/Type/Position/Comment

+ +ALTER TABLE table_name CHANGE [COLUMN] col_old_name col_new_name column_type [COMMENT col_comment] [FIRST|AFTER column_name] + +

Not supported. Throws an exception with message "Operation Not Supported".

+ + +

ALTER TABLE Add/Replace Columns

+ +ALTER TABLE table_name ADD|REPLACE COLUMNS (col_name data_type [COMMENT col_comment], ...) + +

Note the following:

+
    +
  • ADD Columns is allowed. Behavior same as of Hive.
  • +
  • Replace column is not supported. Throws an exception with message "Operation Not Supported".
  • +
+ + +

ALTER TABLE TOUCH

+ +ALTER TABLE table_name TOUCH; +ALTER TABLE table_name TOUCH PARTITION partition_spec; + +

Not supported. Throws an exception with message "Operation Not Supported".

+
+ + +
+ Create/Drop/Alter View +

CREATE VIEW

+

Not supported. Throws an exception with message "Operation Not Supported".

+ +

DROP VIEW

+

Not supported. Throws an exception with message "Operation Not Supported".

+ +

ALTER VIEW

+

Not supported. Throws an exception with message "Operation Not Supported".

+
+ + +
+ Show/Describe + +

SHOW TABLES

+

Supported. Behavior same as Hive.

+ +

SHOW PARTITIONS

+

Not supported. Throws an exception with message "Operation Not Supported".

+ +

SHOW FUNCTIONS

+

Supported. Behavior same as Hive.

+ +

DESCRIBE

+

Supported. Behavior same as Hive.

+
+ + +
+ Other Commands +

Any command not listed above is NOT supported and throws an exception with message "Operation Not Supported".

+
+ +
+ + + + +
Added: incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml (added) +++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml Tue Apr 12 17:30:08 2011 @@ -0,0 +1,116 @@ + + + + + +
+ Overview +
+ +
+ HCatalog + +

HCatalog is a table management and storage management layer for Hadoop that enables users with different data processing tools – Pig, MapReduce, Hive, Streaming – to more easily read and write data on the grid. HCatalog’s table abstraction presents users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users need not worry about where or in what format their data is stored – RCFile format, text files, sequence files.

+

(Note: In this release, Streaming is not supported. Also, HCatalog supports only writing RCFile formatted files and only reading PigStorage formated text files.)

+

+
+ + +
+ + +
+ HCatalog Architecture +

HCatalog is built on top of the Hive metastore and incorporates components from the Hive DDL. HCatalog provides read and write interfaces for Pig and MapReduce and a command line interface for data definitions.

+

(Note: HCatalog notification is not available in this release.)

+ +
+ +

+ +
+Interfaces +

The HCatalog interface for Pig – HCatLoader and HCatStorer – is an implementation of the Pig load and store interfaces. HCatLoader accepts a table to read data from; you can indicate which partitions to scan by immediately following the load statement with a partition filter statement. HCatStorer accepts a table to write to and a specification of partition keys to create a new partition. Currently HCatStorer only supports writing to one partition. HCatLoader and HCatStorer are implemented on top of HCatInputFormat and HCatOutputFormat respectively (see HCatalog Load and Store).

+ +

The HCatalog interface for MapReduce – HCatInputFormat and HCatOutputFormat – is an implementation of Hadoop InputFormat and OutputFormat. HCatInputFormat accepts a table to read data from and a selection predicate to indicate which partitions to scan. HCatOutputFormat accepts a table to write to and a specification of partition keys to create a new partition. Currently HCatOutputFormat only supports writing to one partition (see HCatalog Input and Output).

+ +

Note: Currently there is no Hive-specific interface. Since HCatalog uses Hive's metastore, Hive can read data in HCatalog directly as long as a SerDe for that data already exists. In the future we plan to write a HCatalogSerDe so that users won't need storage-specific SerDes and so that Hive users can write data to HCatalog. Currently, this is supported - if a Hive user writes data in the RCFile format, it is possible to read the data through HCatalog. Also, see Supported data formats.

+ +

Data is defined using HCatalog's command line interface (CLI). The HCatalog CLI supports most of the DDL portion of Hive's query language, allowing users to create, alter, drop tables, etc. The CLI also supports the data exploration part of the Hive command line, such as SHOW TABLES, DESCRIBE TABLE, etc. (see the HCatalog Command Line Interface).

+
+ +
+Data Model +

HCatalog presents a relational view of data in HDFS. Data is stored in tables and these tables can be placed in databases. Tables can also be hash partitioned on one or more keys; that is, for a given value of a key (or set of keys) there will be one partition that contains all rows with that value (or set of values). For example, if a table is partitioned on date and there are three days of data in the table, there will be three partitions in the table. New partitions can be added to a table, and partitions can be dropped from a table. Partitioned tables have no partitions at create time. Unpartitioned tables effectively have one default partition that must be created at table creation time. There is no guaranteed read consistency when a partition is dropped.

+ +

Partitions contain records. Once a partition is created records cannot be added to it, removed from it, or updated in it. (In the future some ability to integrate changes to a partition will be added.) Partitions are multi-dimensional and not hierarchical. Records are divided into columns. Columns have a name and a datatype. HCatalog supports the same datatypes as Hive (see HCatalog Load and Store).

+
+
+ +
+ Data Flow Example +

This simple data flow example shows how HCatalog is used to move data from the grid into a database. + From the database, the data can then be analyzed using Hive.

+ +

First Joe in data acquisition uses distcp to get data onto the grid.

+ +hadoop distcp file:///file.dat hdfs://data/rawevents/20100819/data + +hcat "alter table rawevents add partition 20100819 hdfs://data/rawevents/20100819/data" + + +

Second Sally in data processing uses Pig to cleanse and prepare the data.

+

Without HCatalog, Sally must be manually informed by Joe that data is available, or use Oozie and poll on HDFS.

+ +A = load '/data/rawevents/20100819/data' as (alpha:int, beta:chararray, …); +B = filter A by bot_finder(zeta) = 0; +… +store Z into 'data/processedevents/20100819/data'; + + +

With HCatalog, Oozie will be notified by HCatalog data is available and can then start the Pig job

+ +A = load 'rawevents' using HCatLoader; +B = filter A by date = '20100819' and by bot_finder(zeta) = 0; +… +store Z into 'processedevents' using HCatStorer("date=20100819"); + + +

Third Robert in client management uses Hive to analyze his clients' results.

+

Without HCatalog, Robert must alter the table to add the required partition.

+ +alter table processedevents add partition 20100819 hdfs://data/processedevents/20100819/data + +select advertiser_id, count(clicks) +from processedevents +where date = '20100819' +group by adverstiser_id; + +

With HCatalog, Robert does not need to modify the table structure.

+ +select advertiser_id, count(clicks) +from processedevents +where date = ‘20100819’ +group by adverstiser_id; + + +
+ + + +
Added: incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/inputoutput.xml URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/inputoutput.xml?rev=1091509&view=auto ============================================================================== --- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/inputoutput.xml (added) +++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/inputoutput.xml Tue Apr 12 17:30:08 2011 @@ -0,0 +1,152 @@ + + + + + +
+ HCatalog Input and Output Interfaces +
+ + + +
+ Set Up +

No HCatalog-specific setup is required for the HCatInputFormat and HCatOutputFormat interfaces.

+

+

Authentication

+ + + + +

If a failure results in a message like "2010-11-03 16:17:28,225 WARN hive.metastore ... - Unable to connect metastore with URI thrift://..." in /tmp/<username>/hive.log, then make sure you have run "kinit <username>@FOO.COM" to get a kerberos ticket and to be able to authenticate to the HCatalog server.

+
+ + +
+ HCatInputFormat +

The HCatInputFormat is used with MapReduce jobs to read data from HCatalog managed tables.

+

HCatInputFormat exposes a new Hadoop 20 MapReduce API for reading data as if it had been published to a table. If a MapReduce job uses this InputFormat to write output, the default InputFormat configured for the table is used as the underlying InputFormat and the new partition is published to the table after the job completes. Also, the maximum number of partitions that a job can work on is limited to 100K.

+ +
+ API +

The API exposed by HCatInputFormat is shown below.

+ +

To use HCatInputFormat to read data, first instantiate a HCatTableInfo with the necessary information from the table being read + and then call setInput on the HCatInputFormat.

+ +

You can use the setOutputSchema method to include a projection schema, to specify specific output fields. If a schema is not specified, this default to the table level schema.

+ +

You can use the getTableSchema methods to determine the table schema for a specified input table.

+ + + + /** + * Set the input to use for the Job. This queries the metadata server with + * the specified partition predicates, gets the matching partitions, puts + * the information in the conf object. The inputInfo object is updated with + * information needed in the client context + * @param job the job object + * @param inputInfo the table input info + * @throws IOException the exception in communicating with the metadata server + */ + public static void setInput(Job job, HCatTableInfo inputInfo) throws IOException; + + /** + * Set the schema for the HCatRecord data returned by HCatInputFormat. + * @param job the job object + * @param hcatSchema the schema to use as the consolidated schema + */ + public static void setOutputSchema(Job job,HCatSchema hcatSchema) throws Exception; + + /** + * Gets the HCatalog schema for the table specified in the HCatInputFormat.setInput call + * on the specified job context. This information is available only after HCatInputFormat.setInput + * has been called for a JobContext. + * @param context the context + * @return the table schema + * @throws Exception if HCatInputFormat.setInput has not been called for the current context + */ + public static HCatSchema getTableSchema(JobContext context) throws Exception + + +
+
+ + + +
+ HCatOutputFormat +

HCatOutputFormat is used with MapReduce jobs to write data to HCatalog managed tables.

+ +

HCatOutputFormat exposes a new Hadoop 20 MapReduce API for writing data to a table. If a MapReduce job uses this OutputFormat to write output, the default OutputFormat configured for the table is used as the underlying OutputFormat and the new partition is published to the table after the job completes.

+ +
+ API +

The API exposed by HCatOutputFormat is shown below.

+

The first call on the HCatOutputFormat must be setOutput; any other call will throw an exception saying the output format is not initialized. The schema for the data being written out is specified by the setSchema method. If this is not called on the HCatOutputFormat, then by default it is assumed that the the partition has the same schema as the current table level schema.

+ + +/** + * Set the info about the output to write for the Job. This queries the metadata server + * to find the StorageDriver to use for the table. Throws error if partition is already published. + * @param job the job object + * @param outputInfo the table output info + * @throws IOException the exception in communicating with the metadata server + */ + public static void setOutput(Job job, HCatTableInfo outputInfo) throws IOException; + + /** + * Set the schema for the data being written out to the partition. The + * table schema is used by default for the partition if this is not called. + * @param job the job object + * @param schema the schema for the data + * @throws IOException the exception + */ + public static void setSchema(Job job, HCatSchema schema) throws IOException; + + /** + * Gets the table schema for the table specified in the HCatOutputFormat.setOutput call + * on the specified job context. + * @param context the context + * @return the table schema + * @throws IOException if HCatOutputFormat.setOutput has not been called for the passed context + */ + public static HCatSchema getTableSchema(JobContext context) throws IOException + +
+ +
+ Partition Schema Semantics + +

The partition schema specified can be different from the current table level schema. The rules about what kinds of schema are allowed are:

+ +
    +
  • If a column is present in both the table schema and the partition schema, the type for the column should match. +
  • +
  • If the partition schema has lesser columns that the table level schema, then only the columns at the end of the table schema are allowed to be absent. Columns in the middle cannot be absent. So if table schema is "c1,c2,c3", partition schema can be "c1" or "c1,c2" but not "c1,c3" or "c2,c3"
  • +
  • If the partition schema has extra columns, then the extra columns should appear after the table schema. So if table schema is "c1,c2", the partition schema can be "c1,c2,c3" but not "c1,c3,c4". The table schema is automatically updated to have the extra column. In the previous example, the table schema will become "c1,c2,c3" after the completion of the job. +
  • +
  • The partition keys are not allowed to be present in the schema being written out. +
  • +
+ +
+
+ + +