incubator-hcatalog-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ga...@apache.org
Subject svn commit: r1331643 - in /incubator/hcatalog/trunk: ./ src/docs/ src/docs/src/documentation/content/xdocs/ src/java/org/apache/hcatalog/data/transfer/ src/java/org/apache/hcatalog/mapreduce/
Date Sat, 28 Apr 2012 00:47:31 GMT
Author: gates
Date: Sat Apr 28 00:47:30 2012
New Revision: 1331643

URL: http://svn.apache.org/viewvc?rev=1331643&view=rev
Log:
HCATALOG-368 Documentation improvements: doc set & API docs

Modified:
    incubator/hcatalog/trunk/CHANGES.txt
    incubator/hcatalog/trunk/build.xml
    incubator/hcatalog/trunk/src/docs/overview.html
    incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml
    incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
    incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
    incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
    incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java
    incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java

Modified: incubator/hcatalog/trunk/CHANGES.txt
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/CHANGES.txt?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/CHANGES.txt (original)
+++ incubator/hcatalog/trunk/CHANGES.txt Sat Apr 28 00:47:30 2012
@@ -26,6 +26,8 @@ Trunk (unreleased changes)
   HCAT-328 HCatLoader should report its input size so pig can estimate the number of reducers
(traviscrawford via gates)
 
   IMPROVEMENTS
+  HCAT-368 Documentation improvements: doc set & API docs (lefty via gates)
+
   HCAT-387 Trunk should point to 0.10 snapshot to match hive trunk (toffer)
 
   HCAT-329 HCatalog build fails with pig 0.9 (traviscrawford via hashutosh)

Modified: incubator/hcatalog/trunk/build.xml
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/build.xml?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/build.xml (original)
+++ incubator/hcatalog/trunk/build.xml Sat Apr 28 00:47:30 2012
@@ -471,6 +471,7 @@
              author="true"
              version="true"
              use="true"
+             noqualifier="all"
              windowtitle="HCatalog ${hcatalog.version} API"
              doctitle="HCatalog ${hcatalog.version} API"
              failonerror="true">

Modified: incubator/hcatalog/trunk/src/docs/overview.html
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/overview.html?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/docs/overview.html (original)
+++ incubator/hcatalog/trunk/src/docs/overview.html Sat Apr 28 00:47:30 2012
@@ -52,54 +52,50 @@
 <a name="HCatalog"></a>
 <h2 class="h3">HCatalog </h2>
 <div class="section">
-<p>HCatalog is a table management and storage management layer for Hadoop that enables
users with different data processing tools &ndash; Pig, MapReduce, Hive, Streaming &ndash;
to more easily read and write data on the grid. HCatalog&rsquo;s table abstraction presents
users with a relational view of data in the Hadoop distributed file system (HDFS) and ensures
that users need not worry about where or in what format their data is stored &ndash; RCFile
format, text files, sequence files. </p>
-<p>(Note: In this release, Streaming is not supported. Also, HCatalog supports only
writing RCFile formatted files and only reading PigStorage formated text files.)</p>
+<p>HCatalog is a table and storage management layer for Hadoop that enables users with
different data processing tools &ndash; Pig, MapReduce, and Hive &ndash; to more easily
read and write data on the grid. HCatalog&rsquo;s table abstraction presents users with
a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users
need not worry about where or in what format their data is stored &ndash; RCFile format,
text files, or SequenceFiles. </p>
+<p>HCatalog supports reading and writing files in any format for which a SerDe can
be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile formats. To
use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.</p>
 <p></p>
-     
-      
-      
+
+  
 <a name="HCatalog+Architecture"></a>
 <h2 class="h3">HCatalog Architecture</h2>
 <div class="section">
-<p>HCatalog is built on top of the Hive metastore and incorporates components from
the Hive DDL. HCatalog provides read and write interfaces for Pig and MapReduce and a command
line interface for data definitions.</p>
-<p>(Note: HCatalog notification is not available in this release.)</p>
+<p>HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. HCatalog
provides read and write interfaces for Pig and MapReduce and uses Hive's command line interface
for issuing data definition and metadata exploration commands.</p>
 <p></p>
 <a name="Interfaces"></a>
 <h3 class="h4">Interfaces</h3>
-<p>The HCatalog interface for Pig &ndash; HCatLoader and HCatStorer &ndash;
is an implementation of the Pig load and store interfaces. HCatLoader accepts a table to read
data from; you can indicate which partitions to scan by immediately following the load statement
with a partition filter statement. HCatStorer accepts a table to write to and a specification
of partition keys to create a new partition. Currently HCatStorer only supports writing to
one partition. HCatLoader and HCatStorer are implemented on top of HCatInputFormat and HCatOutputFormat
respectively </p>
-<p>The HCatalog interface for MapReduce &ndash; HCatInputFormat and HCatOutputFormat
&ndash; is an implementation of Hadoop InputFormat and OutputFormat. HCatInputFormat accepts
a table to read data from and a selection predicate to indicate which partitions to scan.
HCatOutputFormat accepts a table to write to and a specification of partition keys to create
a new partition. Currently HCatOutputFormat only supports writing to one partition.</p>
-<p>
-<strong>Note:</strong> Currently there is no Hive-specific interface. Since HCatalog
uses Hive's metastore, Hive can read data in HCatalog directly as long as a SerDe for that
data already exists. In the future we plan to write a HCatalogSerDe so that users won't need
storage-specific SerDes and so that Hive users can write data to HCatalog. Currently, this
is supported - if a Hive user writes data in the RCFile format, it is possible to read the
data through HCatalog. </p>
-<p>Data is defined using HCatalog's command line interface (CLI). The HCatalog CLI
supports most of the DDL portion of Hive's query language, allowing users to create, alter,
drop tables, etc. The CLI also supports the data exploration part of the Hive command line,
such as SHOW TABLES, DESCRIBE TABLE, etc.</p>
+<p>The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which implement
the Pig load and store interfaces respectively. HCatLoader accepts a table to read data from;
you can indicate which partitions to scan by immediately following the load statement with
a partition filter statement. HCatStorer accepts a table to write to and optionally a specification
of partition keys to create a new partition. You can write to a single partition by specifying
the partition key(s) and value(s) in the STORE clause; and you can write to multiple partitions
if the partition key(s) are columns in the data being stored. HCatLoader is implemented on
top of HCatInputFormat and HCatStorer is implemented on top of HCatOutputFormat (see <a
href="loadstore.html">HCatalog Load and Store</a>).</p>
+<p>HCatInputFormat and HCatOutputFormat are HCatalog's interface for MapReduce; they
implement Hadoop's InputFormat and OutputFormat, respectively. HCatInputFormat accepts a table
to read data from and optionally a selection predicate to indicate which partitions to scan.
HCatOutputFormat accepts a table to write to and optionally a specification of partition keys
to create a new partition. You can write to a single partition by specifying the partition
key(s) and value(s) in the STORE clause; and you can write to multiple partitions if the partition
key(s) are columns in the data being stored. (See <a href="inputoutput.html">HCatalog
Input and Output</a>.)</p>
+<p>Note: There is no Hive-specific interface. Since HCatalog uses Hive's metastore,
Hive can read data in HCatalog directly.</p>
+<p>Data is defined using HCatalog's command line interface (CLI). The HCatalog CLI
supports all Hive DDL that does not require MapReduce to execute, allowing users to create,
alter, drop tables, etc. (Unsupported Hive DDL includes import/export, CREATE TABLE AS SELECT,
ALTER TABLE options REBUILD and CONCATENATE, and ANALYZE TABLE ... COMPUTE STATISTICS.) The
CLI also supports the data exploration part of the Hive command line, such as SHOW TABLES,
DESCRIBE TABLE, etc. (see the <a href="cli.html">HCatalog Command Line Interface</a>).</p>
 <a name="Data+Model"></a>
 <h3 class="h4">Data Model</h3>
-<p>HCatalog presents a relational view of data in HDFS. Data is stored in tables and
these tables can be placed in databases. Tables can also be hash partitioned on one or more
keys; that is, for a given value of a key (or set of keys) there will be one partition that
contains all rows with that value (or set of values). For example, if a table is partitioned
on date and there are three days of data in the table, there will be three partitions in the
table. New partitions can be added to a table, and partitions can be dropped from a table.
Partitioned tables have no partitions at create time. Unpartitioned tables effectively have
one default partition that must be created at table creation time. There is no guaranteed
read consistency when a partition is dropped.</p>
-<p>Partitions contain records. Once a partition is created records cannot be added
to it, removed from it, or updated in it. (In the future some ability to integrate changes
to a partition will be added.) Partitions are multi-dimensional and not hierarchical. Records
are divided into columns. Columns have a name and a datatype. HCatalog supports the same datatypes
as Hive. </p>
+<p>HCatalog presents a relational view of data. Data is stored in tables and these
tables can be placed in databases. Tables can also be hash partitioned on one or more keys;
that is, for a given value of a key (or set of keys) there will be one partition that contains
all rows with that value (or set of values). For example, if a table is partitioned on date
and there are three days of data in the table, there will be three partitions in the table.
New partitions can be added to a table, and partitions can be dropped from a table. Partitioned
tables have no partitions at create time. Unpartitioned tables effectively have one default
partition that must be created at table creation time. There is no guaranteed read consistency
when a partition is dropped.</p>
+<p>Partitions contain records. Once a partition is created records cannot be added
to it, removed from it, or updated in it. Partitions are multi-dimensional and not hierarchical.
Records are divided into columns. Columns have a name and a datatype. HCatalog supports the
same datatypes as Hive (see <a href="loadstore.html">HCatalog Load and Store</a>).
</p>
 </div>
      
   
 <a name="Data+Flow+Example"></a>
 <h2 class="h3">Data Flow Example</h2>
 <div class="section">
-<p>This simple data flow example shows how HCatalog is used to move data from the grid
into a database. 
-  From the database, the data can then be analyzed using Hive.</p>
+<p>This simple data flow example shows how HCatalog can help grid users share and access
data.</p>
 <p>
 <strong>First</strong> Joe in data acquisition uses distcp to get data onto the
grid.</p>
 <pre class="code">
 hadoop distcp file:///file.dat hdfs://data/rawevents/20100819/data
 
-hcat "alter table rawevents add partition 20100819 hdfs://data/rawevents/20100819/data"
+hcat "alter table rawevents add partition (ds='20100819') location 'hdfs://data/rawevents/20100819/data'"
 </pre>
 <p>
 <strong>Second</strong> Sally in data processing uses Pig to cleanse and prepare
the data.</p>
-<p>Without HCatalog, Sally must be manually informed by Joe that data is available,
or use Oozie and poll on HDFS.</p>
+<p>Without HCatalog, Sally must be manually informed by Joe when data is available,
or poll on HDFS.</p>
 <pre class="code">
 A = load '/data/rawevents/20100819/data' as (alpha:int, beta:chararray, &hellip;);
 B = filter A by bot_finder(zeta) = 0;
 &hellip;
 store Z into 'data/processedevents/20100819/data';
 </pre>
-<p>With HCatalog, Oozie will be notified by HCatalog data is available and can then
start the Pig job</p>
+<p>With HCatalog, HCatalog will send a JMS message that data is available. The Pig
job can then be started.</p>
 <pre class="code">
 A = load 'rawevents' using HCatLoader;
 B = filter A by date = '20100819' and by bot_finder(zeta) = 0;
@@ -115,20 +111,20 @@ alter table processedevents add partitio
 select advertiser_id, count(clicks)
 from processedevents
 where date = '20100819' 
-group by adverstiser_id;
+group by advertiser_id;
 </pre>
 <p>With HCatalog, Robert does not need to modify the table structure.</p>
 <pre class="code">
 select advertiser_id, count(clicks)
 from processedevents
 where date = &lsquo;20100819&rsquo; 
-group by adverstiser_id;
+group by advertiser_id;
 </pre>
 </div>
   
 <div class="copyright">
         Copyright &copy;
-         2011 <a href="http://www.apache.org/licenses/">The Apache Software Foundation</a>
+         2012 <a href="http://www.apache.org/licenses/">The Apache Software Foundation</a>
 </div>
 </div>
 </body>

Modified: incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml (original)
+++ incubator/hcatalog/trunk/src/docs/src/documentation/content/xdocs/index.xml Sat Apr 28
00:47:30 2012
@@ -25,8 +25,8 @@
    <section>
       <title>HCatalog </title>
       
-       <p>HCatalog is a table and storage management layer for Hadoop that enables
users with different data processing tools – Pig, MapReduce, and Hive – to more
easily read and write data on the grid. HCatalog’s table abstraction presents users with
a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users
need not worry about where or in what format their data is stored – RCFile format, text
files, or sequence files. </p>
-<p>HCatalog supports reading and writing files in any format for which a SerDe can
be written. By default, HCatalog supports RCFile, CSV, JSON, and sequence file formats. To
use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.</p>
+       <p>HCatalog is a table and storage management layer for Hadoop that enables
users with different data processing tools – Pig, MapReduce, and Hive – to more
easily read and write data on the grid. HCatalog’s table abstraction presents users with
a relational view of data in the Hadoop distributed file system (HDFS) and ensures that users
need not worry about where or in what format their data is stored – RCFile format, text
files, or SequenceFiles. </p>
+<p>HCatalog supports reading and writing files in any format for which a SerDe can
be written. By default, HCatalog supports RCFile, CSV, JSON, and SequenceFile formats. To
use a custom format, you must provide the InputFormat, OutputFormat, and SerDe.</p>
 <p></p>
 <figure src="images/hcat-product.jpg" align="left" alt="HCatalog Product"/>
 
@@ -36,16 +36,15 @@
       
       <section>
       <title>HCatalog Architecture</title>
-      <p>HCatalog is built on top of the Hive metastore and incorporates components
from the Hive DDL. HCatalog provides read and write interfaces for Pig and MapReduce and uses
-      Hive's command line interface for issuing data definition and metadata exploration
commands.</p>
+      <p>HCatalog is built on top of the Hive metastore and incorporates Hive's DDL.
HCatalog provides read and write interfaces for Pig and MapReduce and uses Hive's command
line interface for issuing data definition and metadata exploration commands.</p>
 
 <p></p>
 
 <section>
 <title>Interfaces</title>   
-<p>The HCatalog interface for Pig – HCatLoader and HCatStorer – is an implementation
of the Pig load and store interfaces. HCatLoader accepts a table to read data from; you can
indicate which partitions to scan by immediately following the load statement with a partition
filter statement. HCatStorer accepts a table to write to and optionally a specification of
partition keys to create a new partition. You can write to a single partition by specifying
the partition key(s) and value(s) in the STORE clause; and you can write to multiple partitions
if the partition key(s) are columns in the data being stored. HCatLoader and HCatStorer are
implemented on top of HCatInputFormat and HCatOutputFormat, respectively (see <a href="loadstore.html">HCatalog
Load and Store</a>).</p>
+<p>The HCatalog interface for Pig consists of HCatLoader and HCatStorer, which implement
the Pig load and store interfaces respectively. HCatLoader accepts a table to read data from;
you can indicate which partitions to scan by immediately following the load statement with
a partition filter statement. HCatStorer accepts a table to write to and optionally a specification
of partition keys to create a new partition. You can write to a single partition by specifying
the partition key(s) and value(s) in the STORE clause; and you can write to multiple partitions
if the partition key(s) are columns in the data being stored. HCatLoader is implemented on
top of HCatInputFormat and HCatStorer is implemented on top of HCatOutputFormat (see <a
href="loadstore.html">HCatalog Load and Store</a>).</p>
 
-<p>The HCatalog interface for MapReduce – HCatInputFormat and HCatOutputFormat
– is an implementation of Hadoop InputFormat and OutputFormat. HCatInputFormat accepts
a table to read data from and optionally a selection predicate to indicate which partitions
to scan. HCatOutputFormat accepts a table to write to and optionally a specification of partition
keys to create a new partition. You can write to a single partition by specifying the partition
key(s) and value(s) in the STORE clause; and you can write to multiple partitions if the partition
key(s) are columns in the data being stored. (See <a href="inputoutput.html">HCatalog
Input and Output</a>.)</p>
+<p>HCatInputFormat and HCatOutputFormat are HCatalog's interface for MapReduce; they
implement Hadoop's InputFormat and OutputFormat, respectively. HCatInputFormat accepts a table
to read data from and optionally a selection predicate to indicate which partitions to scan.
HCatOutputFormat accepts a table to write to and optionally a specification of partition keys
to create a new partition. You can write to a single partition by specifying the partition
key(s) and value(s) in the STORE clause; and you can write to multiple partitions if the partition
key(s) are columns in the data being stored. (See <a href="inputoutput.html">HCatalog
Input and Output</a>.)</p>
 
 <p>Note: There is no Hive-specific interface. Since HCatalog uses Hive's metastore,
Hive can read data in HCatalog directly.</p>
 

Modified: incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
(original)
+++ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/data/transfer/DataTransferFactory.java
Sat Apr 28 00:47:30 2012
@@ -33,9 +33,9 @@ import org.apache.hcatalog.data.transfer
 public class DataTransferFactory {
 
 	/**
-	 * This should be called once from master node to obtain an instance of {@link HCatReader}
-	 * @param re built using {@link ReadEntity.Builder}
-	 * @param config Any configuration which master node wants to pass to HCatalog
+	 * This should be called once from master node to obtain an instance of {@link HCatReader}.
+	 * @param re ReadEntity built using {@link ReadEntity.Builder}
+	 * @param config any configuration which master node wants to pass to HCatalog
 	 * @return {@link HCatReader}
 	 */
 	public static HCatReader getHCatReader(final ReadEntity re, final Map<String,String>
config) {
@@ -44,9 +44,9 @@ public class DataTransferFactory {
 	}
 
 	/**
-	 * This should only be called once from every slave nodes to obtain an instance of {@link
HCatReader}
-	 * @param split obtained at master node.
-	 * @param config obtained at master node.
+	 * This should only be called once from every slave node to obtain an instance of {@link
HCatReader}.
+	 * @param split input split obtained at master node
+	 * @param config configuration obtained at master node
 	 * @return {@link HCatReader}
 	 */
 	public static HCatReader getHCatReader(final InputSplit split, final Configuration config)
{
@@ -55,11 +55,11 @@ public class DataTransferFactory {
 	}
 
 	/**
-	 * This should only be called once from every slave nodes to obtain an instance of {@link
HCatReader}
-	 * This should be called if external system has some state to provide to HCatalog
-	 * @param split obtained at master node.
-	 * @param config obtained at master node.
-	 * @param sp 
+	 * This should only be called once from every slave node to obtain an instance of {@link
HCatReader}.
+	 * This should be called if an external system has some state to provide to HCatalog.
+	 * @param split input split obtained at master node
+	 * @param config configuration obtained at master node
+	 * @param sp {@link StateProvider}
 	 * @return {@link HCatReader}
 	 */
 	public static HCatReader getHCatReader(final InputSplit split, final Configuration config,
StateProvider sp) {
@@ -67,9 +67,9 @@ public class DataTransferFactory {
 		return new HCatInputFormatReader(split, config, sp);
 	}
 	
-	/** This should be called at master node to obtain an instance of {@link HCatWriter}
-	 * @param we built using {@link WriteEntity.Builder}
-	 * @param config Any configuration which master wants to pass to HCatalog
+	/** This should be called at master node to obtain an instance of {@link HCatWriter}.
+	 * @param we WriteEntity built using {@link WriteEntity.Builder}
+	 * @param config any configuration which master wants to pass to HCatalog
 	 * @return {@link HCatWriter}
 	 */
 	public static HCatWriter getHCatWriter(final WriteEntity we, final Map<String,String>
config) {
@@ -77,8 +77,8 @@ public class DataTransferFactory {
 		return new HCatOutputFormatWriter(we, config);
 	}
 
-	/** This should be called at slave nodes to obtain an instance of {@link HCatWriter}
-	 * @param cntxt {@link WriterContext} obtained at master node.
+ 	/** This should be called at slave nodes to obtain an instance of {@link HCatWriter}.
+ 	 * @param cntxt {@link WriterContext} obtained at master node
 	 * @return {@link HCatWriter}
 	 */
 	public static HCatWriter getHCatWriter(final WriterContext cntxt) {
@@ -86,10 +86,10 @@ public class DataTransferFactory {
 		return getHCatWriter(cntxt, DefaultStateProvider.get());
 	}
 	
-	/** This should be called at slave nodes to obtain an instance of {@link HCatWriter}
-	 * If external system has some mechanism for providing state to HCatalog, this constructor
+ 	/** This should be called at slave nodes to obtain an instance of {@link HCatWriter}.
+ 	 *  If an external system has some mechanism for providing state to HCatalog, this constructor
 	 *  can be used.
-	 * @param cntxt {@link WriterContext} obtained at master node.
+ 	 * @param cntxt {@link WriterContext} obtained at master node
 	 * @param sp {@link StateProvider} 
 	 * @return {@link HCatWriter}
 	 */

Modified: incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java (original)
+++ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatInputFormat.java Sat
Apr 28 00:47:30 2012
@@ -22,16 +22,16 @@ import java.io.IOException;
 
 import org.apache.hadoop.mapreduce.Job;
 
-/** The InputFormat to use to read data from HCat */
+/** The InputFormat to use to read data from HCatalog. */
 public class HCatInputFormat extends HCatBaseInputFormat {
 
   /**
-   * Set the input to use for the Job. This queries the metadata server with
-   * the specified partition predicates, gets the matching partitions, puts
-   * the information in the conf object. The inputInfo object is updated with
-   * information needed in the client context
+   * Set the input information to use for the job. This queries the metadata server 
+   * with the specified partition predicates, gets the matching partitions, and 
+   * puts the information in the conf object. The inputInfo object is updated 
+   * with information needed in the client context.
    * @param job the job object
-   * @param inputJobInfo the input info for table to read
+   * @param inputJobInfo the input information about the table to read
    * @throws IOException the exception in communicating with the metadata server
    */
   public static void setInput(Job job,

Modified: incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
(original)
+++ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/HCatOutputFormat.java
Sat Apr 28 00:47:30 2012
@@ -51,8 +51,8 @@ import org.apache.hcatalog.common.HCatUt
 import org.apache.hcatalog.data.HCatRecord;
 import org.apache.hcatalog.data.schema.HCatSchema;
 
-/** The OutputFormat to use to write data to HCat. The key value is ignored and
- * and should be given as null. The value is the HCatRecord to write.*/
+/** The OutputFormat to use to write data to HCatalog. The key value is ignored and
+ *  should be given as null. The value is the HCatRecord to write.*/
 public class HCatOutputFormat extends HCatBaseOutputFormat {
 
     static final private Log LOG = LogFactory.getLog(HCatOutputFormat.class);
@@ -61,10 +61,11 @@ public class HCatOutputFormat extends HC
     private static boolean harRequested;
 
     /**
-     * Set the info about the output to write for the Job. This queries the metadata server
-     * to find the StorageHandler to use for the table.  Throws error if partition is already
published.
+     * Set the information about the output to write for the job. This queries the metadata
server
+     * to find the StorageHandler to use for the table.  It throws an error if the 
+     * partition is already published.
      * @param job the job object
-     * @param outputJobInfo the table output info
+     * @param outputJobInfo the table output information for the job
      * @throws IOException the exception in communicating with the metadata server
      */
     @SuppressWarnings("unchecked")
@@ -204,6 +205,7 @@ public class HCatOutputFormat extends HC
      * table schema is used by default for the partition if this is not called.
      * @param job the job object
      * @param schema the schema for the data
+     * @throws IOException
      */
     public static void setSchema(final Job job, final HCatSchema schema) throws IOException
{
 
@@ -214,11 +216,12 @@ public class HCatOutputFormat extends HC
     }
 
     /**
-     * Get the record writer for the job. Uses the StorageHandler's default OutputFormat
-     * to get the record writer.
-     * @param context the information about the current task.
-     * @return a RecordWriter to write the output for the job.
+     * Get the record writer for the job. This uses the StorageHandler's default 
+     * OutputFormat to get the record writer.
+     * @param context the information about the current task
+     * @return a RecordWriter to write the output for the job
      * @throws IOException
+     * @throws InterruptedException
      */
     @Override
     public RecordWriter<WritableComparable<?>, HCatRecord>

Modified: incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java (original)
+++ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/PartInfo.java Sat Apr
28 00:47:30 2012
@@ -27,7 +27,7 @@ import org.apache.hadoop.hive.ql.plan.Ta
 import org.apache.hcatalog.data.schema.HCatSchema;
 import org.apache.hcatalog.mapreduce.HCatStorageHandler;
 
-/** The Class used to serialize the partition information read from the metadata server that
maps to a partition */
+/** The Class used to serialize the partition information read from the metadata server that
maps to a partition. */
 public class PartInfo implements Serializable {
 
   /** The serialization version */
@@ -63,6 +63,8 @@ public class PartInfo implements Seriali
    * @param storageHandler the storage handler
    * @param location the location
    * @param hcatProperties hcat-specific properties at the partition
+   * @param jobProperties the job properties
+   * @param tableInfo the table information
    */
   public PartInfo(HCatSchema partitionSchema, HCatStorageHandler storageHandler,
                   String location, Properties hcatProperties, 
@@ -116,8 +118,8 @@ public class PartInfo implements Seriali
   }
 
   /**
-   * Gets the value of hcatProperties.
-   * @return the hcatProperties
+   * Gets the input storage handler properties.
+   * @return HCat-specific properties set at the partition 
    */
   public Properties getInputStorageHandlerProperties() {
     return hcatProperties;
@@ -147,10 +149,18 @@ public class PartInfo implements Seriali
     return partitionValues;
   }
 
+  /**
+   * Gets the job properties.
+   * @return a map of the job properties
+   */
   public Map<String,String> getJobProperties() {
     return jobProperties;
   }
 
+  /**
+   * Gets the HCatalog table information.
+   * @return the table information
+   */
   public HCatTableInfo getTableInfo() {
     return tableInfo;
   }

Modified: incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java
URL: http://svn.apache.org/viewvc/incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java?rev=1331643&r1=1331642&r2=1331643&view=diff
==============================================================================
--- incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java (original)
+++ incubator/hcatalog/trunk/src/java/org/apache/hcatalog/mapreduce/StorerInfo.java Sat Apr
28 00:47:30 2012
@@ -19,7 +19,7 @@ package org.apache.hcatalog.mapreduce;
 import java.io.Serializable;
 import java.util.Properties;
 
-/** Info about the storer to use for writing the data */
+/** Information about the storer to use for writing the data. */
 public class StorerInfo implements Serializable {
 
     /** The serialization version */
@@ -37,12 +37,12 @@ public class StorerInfo implements Seria
     private String storageHandlerClass;
 
     /**
-     * Initialize the storer info
-     * @param ifClass
-     * @param ofClass
-     * @param serdeClass
-     * @param storageHandlerClass
-     * @param properties
+     * Initialize the storer information.
+     * @param ifClass the input format class
+     * @param ofClass the output format class
+     * @param serdeClass the SerDe class
+     * @param storageHandlerClass the storage handler class
+     * @param properties the properties for the storage handler
      */
     public StorerInfo(String ifClass, String ofClass, String serdeClass, String storageHandlerClass,
Properties properties) {
       super();
@@ -53,35 +53,50 @@ public class StorerInfo implements Seria
       this.properties = properties;
     }
 
+    /**
+     * @return the input format class
+     */
     public String getIfClass() {
         return ifClass;
     }
 
+    /**
+     * @param ifClass the input format class
+     */
     public void setIfClass(String ifClass) {
         this.ifClass = ifClass;
     }
 
+    /**
+     * @return the output format class
+     */
     public String getOfClass() {
         return ofClass;
     }
 
+    /**
+     * @return the serdeClass
+     */
     public String getSerdeClass() {
         return serdeClass;
     }
 
+    /**
+     * @return the storageHandlerClass
+     */
     public String getStorageHandlerClass() {
         return storageHandlerClass;
     }
 
     /**
-     * @return the properties
+     * @return the storer properties
      */
     public Properties getProperties() {
       return properties;
     }
 
     /**
-     * @param properties the properties to set
+     * @param properties the storer properties to set 
      */
     public void setProperties(Properties properties) {
       this.properties = properties;



Mime
View raw message