incubator-hcatalog-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hashut...@apache.org
Subject svn commit: r1178503 - in /incubator/hcatalog/site/publish: about.pdf docs/r0.2.0/importexport.html docs/r0.2.0/importexport.pdf index.pdf issue_tracking.pdf linkmap.pdf mailing_lists.pdf privacypolicy.pdf releases.pdf version_control.pdf whoweare.pdf
Date Mon, 03 Oct 2011 18:32:28 GMT
Author: hashutosh
Date: Mon Oct  3 18:32:27 2011
New Revision: 1178503

URL: http://svn.apache.org/viewvc?rev=1178503&view=rev
Log:
Add regenerated site with import/export

Added:
    incubator/hcatalog/site/publish/docs/r0.2.0/importexport.html
    incubator/hcatalog/site/publish/docs/r0.2.0/importexport.pdf   (with props)
Modified:
    incubator/hcatalog/site/publish/about.pdf
    incubator/hcatalog/site/publish/index.pdf
    incubator/hcatalog/site/publish/issue_tracking.pdf
    incubator/hcatalog/site/publish/linkmap.pdf
    incubator/hcatalog/site/publish/mailing_lists.pdf
    incubator/hcatalog/site/publish/privacypolicy.pdf
    incubator/hcatalog/site/publish/releases.pdf
    incubator/hcatalog/site/publish/version_control.pdf
    incubator/hcatalog/site/publish/whoweare.pdf

Modified: incubator/hcatalog/site/publish/about.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/about.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/about.pdf (original) and incubator/hcatalog/site/publish/about.pdf
Mon Oct  3 18:32:27 2011 differ

Added: incubator/hcatalog/site/publish/docs/r0.2.0/importexport.html
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/importexport.html?rev=1178503&view=auto
==============================================================================
--- incubator/hcatalog/site/publish/docs/r0.2.0/importexport.html (added)
+++ incubator/hcatalog/site/publish/docs/r0.2.0/importexport.html Mon Oct  3 18:32:27 2011
@@ -0,0 +1,933 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
+<html>
+<head>
+<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
+<meta content="Apache Forrest" name="Generator">
+<meta name="Forrest-version" content="0.9">
+<meta name="Forrest-skin-name" content="pelt">
+<title>Import and Export Commands</title>
+<link type="text/css" href="skin/basic.css" rel="stylesheet">
+<link media="screen" type="text/css" href="skin/screen.css" rel="stylesheet">
+<link media="print" type="text/css" href="skin/print.css" rel="stylesheet">
+<link type="text/css" href="skin/profile.css" rel="stylesheet">
+<script src="skin/getBlank.js" language="javascript" type="text/javascript"></script><script
src="skin/getMenu.js" language="javascript" type="text/javascript"></script><script
src="skin/fontsize.js" language="javascript" type="text/javascript"></script>
+<link rel="shortcut icon" href="">
+</head>
+<body onload="init()">
+<script type="text/javascript">ndeSetTextSize();</script>
+<div id="top">
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+<script src="skin/breadcrumbs.js" language="JavaScript" type="text/javascript"></script>
+</div>
+<!--+
+    |header
+    +-->
+<div class="header">
+<!--+
+    |start group logo
+    +-->
+<div class="grouplogo">
+<a href=""><img class="logoImage" alt="HCatalog" src="images/hcat.jpg" title=""></a>
+</div>
+<!--+
+    |end group logo
+    +-->
+<!--+
+    |start Project Logo
+    +-->
+<div class="projectlogoA1">
+<a href=""><img class="logoImage" alt="HCatalog" src="images/hcat-box.jpg" title="A
table abstraction on top of data for use with java MapReduce programs, Pig scripts and Hive
queryies."></a>
+</div>
+<!--+
+    |end Project Logo
+    +-->
+<!--+
+    |start Tabs
+    +-->
+<ul id="tabs">
+<li class="current">
+<a class="selected" href="index.html">HCatalog 0.2.0 Documentation</a>
+</li>
+</ul>
+<!--+
+    |end Tabs
+    +-->
+</div>
+</div>
+<div id="main">
+<div id="publishedStrip">
+<!--+
+    |start Subtabs
+    +-->
+<div id="level2tabs"></div>
+<!--+
+    |end Endtabs
+    +-->
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<!--+
+    |breadtrail
+    +-->
+<div class="breadtrail">
+
+             &nbsp;
+           </div>
+<!--+
+    |start Menu, mainarea
+    +-->
+<!--+
+    |start Menu
+    +-->
+<div id="menu">
+<div onclick="SwitchMenu('menu_selected_1.1', 'skin/')" id="menu_selected_1.1Title" class="menutitle"
style="background-image: url('skin/images/chapter_open.gif');">HCatalog</div>
+<div id="menu_selected_1.1" class="selectedmenuitemgroup" style="display: block;">
+<div class="menuitem">
+<a href="index.html">Overview</a>
+</div>
+<div class="menuitem">
+<a href="install.html">Source Installation</a>
+</div>
+<div class="menuitem">
+<a href="rpminstall.html">RPM Installation</a>
+</div>
+<div class="menuitem">
+<a href="loadstore.html">Load &amp; Store Interfaces</a>
+</div>
+<div class="menuitem">
+<a href="inputoutput.html">Input &amp; Output Interfaces </a>
+</div>
+<div class="menupage">
+<div class="menupagetitle">Import &amp; Export Commands </div>
+</div>
+<div class="menuitem">
+<a href="cli.html">Command Line Interface </a>
+</div>
+<div class="menuitem">
+<a href="supportedformats.html">Storage Formats</a>
+</div>
+<div class="menuitem">
+<a href="dynpartition.html">Dynamic Partitioning</a>
+</div>
+<div class="menuitem">
+<a href="notification.html">Notification</a>
+</div>
+<div class="menuitem">
+<a href="api/index.html">API Docs</a>
+</div>
+</div>
+<div id="credit"></div>
+<div id="roundbottom">
+<img style="display: none" class="corner" height="15" width="15" alt="" src="skin/images/rc-b-l-15-1body-2menu-3menu.png"></div>
+<!--+
+  |alternative credits
+  +-->
+<div id="credit2"></div>
+</div>
+<!--+
+    |end Menu
+    +-->
+<!--+
+    |start content
+    +-->
+<div id="content">
+<div title="Portable Document Format" class="pdflink">
+<a class="dida" href="importexport.pdf"><img alt="PDF -icon" src="skin/images/pdfdoc.gif"
class="skin"><br>
+        PDF</a>
+</div>
+<h1>Import and Export Commands</h1>
+<div id="front-matter">
+<div id="minitoc-area">
+<ul class="minitoc">
+<li>
+<a href="#Overview">Overview</a>
+</li>
+<li>
+<a href="#Export+Command">Export Command</a>
+<ul class="minitoc">
+<li>
+<a href="#Syntax">Syntax</a>
+</li>
+<li>
+<a href="#Terms">Terms</a>
+</li>
+<li>
+<a href="#Usage">Usage</a>
+</li>
+<li>
+<a href="#Examples">Examples</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Import+Command">Import Command</a>
+<ul class="minitoc">
+<li>
+<a href="#Syntax-N1016B">Syntax</a>
+</li>
+<li>
+<a href="#Terms-N10180">Terms</a>
+</li>
+<li>
+<a href="#Usage-N10204">Usage</a>
+</li>
+<li>
+<a href="#Examples-N1025A">Examples</a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Usage+with+MapReduce">Usage with MapReduce</a>
+<ul class="minitoc">
+<li>
+<a href="#HCatEximOutputFormat">HCatEximOutputFormat </a>
+</li>
+<li>
+<a href="#HCatEximInputFormat">HCatEximInputFormat </a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Usage+with+Pig">Usage with Pig</a>
+<ul class="minitoc">
+<li>
+<a href="#HCatEximStorer">HCatEximStorer </a>
+</li>
+<li>
+<a href="#HCatEximLoader">HCatEximLoader </a>
+</li>
+</ul>
+</li>
+<li>
+<a href="#Use+Cases">Use Cases</a>
+</li>
+</ul>
+</div>
+</div>
+
+ <!-- ==================================================================== --> 
+  
+<a name="Overview"></a>
+<h2 class="h3">Overview</h2>
+<div class="section">
+<p>The HCatalog IMPORT and EXPORT commands enable you to:</p>
+<ul>
+  
+<li>Extract the data and the metadata associated with a table in HCatalog as a stand-alone
package so that these can be transferred across HCatalog instances.</li>
+  
+<li>Create the data and metadata associated with a table in a setup where there is
no HCatalog metastore. </li>
+  
+<li>Import the data and the metadata into an existing HCatalog instance. </li>
+  
+<li>Use the exported package as input to both pig and mapreduce jobs. </li>
+  
+</ul>
+<p></p>
+<p>The output location of the exported dataset is a directory that has the following
structure:</p>
+<ul>
+  
+<li>A _metadata file that contains the metadata of the table, and if the table is partitioned,
for all the exported partitions.</li>
+  
+<li>A subdirectory hierarchy for each exported partition (or just one "data" subdirectory,
in case of a non-partitioned table) that contains the data files of the table/partitions.
</li>
+  
+</ul>
+<p></p>
+<p>Note that this directory structure can be created using the EXPORT as well as HCatEximOuptutFormat
for MapReduce or HCatPigStorer for Pig. And the data can be consumed using the IMPORT command
as well as HCatEximInputFormat for MapReduce or HCatPigLoader for Pig. </p>
+</div>
+
+<!-- ==================================================================== -->
+
+<a name="Export+Command"></a>
+<h2 class="h3">Export Command</h2>
+<div class="section">
+<p>Exports a table to a specified location.</p>
+<a name="Syntax"></a>
+<h3 class="h4">Syntax</h3>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>EXPORT TABLE tablename [PARTITION (partcol1=val1, partcol2=val2, ...)] TO 'filepath'</p>
+            
+</td>
+        
+</tr>
+    
+</table>
+<a name="Terms"></a>
+<h3 class="h4">Terms</h3>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>TABLE tablename</p>
+            
+</td>
+            <td colspan="1" rowspan="1">
+               
+<p>The table to be exported. The table can be a simple table or a partitioned table.</p>
+               
+<p>If the table is partitioned, you can specify a specific partition of the table by
specifying values for all of the partitioning columns or specifying a subset of the partitions
of the table by specifying a subset of the partition column/value specifications. In this
case, the conditions are implicitly ANDed to filter the partitions to be exported.</p>
+            
+</td>
+        
+</tr>
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>PARTITION (partcol=val ...)</p>
+            
+</td>
+            <td colspan="1" rowspan="1">
+               
+<p>The partition column/value specifications.</p>
+            
+</td>
+        
+</tr>         
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>TO 'filepath'</p>
+            
+</td>
+            <td colspan="1" rowspan="1">
+               
+<p>The filepath (in single quotes) designating the location for the exported table.
The file path can be:</p>
+               
+<ul>
+               
+<li>a relative path ('project/data1') </li>
+               
+<li>an absolute path ('/user/hcat/project/data1') </li>
+               
+<li>a full URI with scheme and, optionally, an authority ('hdfs://namenode:9000/user/hcat/project/data1')
</li>
+               
+</ul>
+            
+</td>
+        
+</tr> 
+   
+</table>
+<a name="Usage"></a>
+<h3 class="h4">Usage</h3>
+<p>The EXPORT command exports a table's data and metadata to the specified location.
Because the command actually <strong>copies</strong> the files defined for the
table/partions, you should be aware of the following:</p>
+<ul>
+	
+<li>No record level filtering, ordering, etc. is done as part of the export. </li>
+    
+<li>Since HCatalog only does file-level copies, the data is not transformed in anyway
while performing the export/import. </li>
+    
+<li>You, the user, are responsible for ensuring that the correct binaries are available
in the target environment (compatible serde classes, hcat storage drivers, etc.).</li>
+	
+</ul>
+<p>Also, note the following:</p>
+<ul>
+	
+<li>The data and the metadata for the table to be exported should exist.</li>
+	
+<li>The target location must not exist or must be an empty directory. </li>
+	
+<li>You must have access as per the hcat access control mechanisms. </li>
+	
+<li>You should have write access to the target location. </li>
+	
+<li>Currently only hdfs is supported in production mode for the target filesystem.
pfile can also be used for testing purposes. </li>
+	
+</ul>
+<a name="Examples"></a>
+<h3 class="h4">Examples</h3>
+<p>The examples assume the following tables:</p>
+<ul>
+	
+<li>dept - non partitioned </li>
+    
+<li>empl - partitioned on emp_country, emp_state, has four partitions ("us"/"ka", "us"/"tn",
"in"/"ka", "in"/"tn") </li>
+	
+</ul>
+<p></p>
+<p>
+<strong>Example 1</strong>
+</p>
+<pre class="code">
+EXPORT TABLE dept TO 'exports/dept'; 
+</pre>
+<p>This example exports the entire table to the target location. The table and the
exported copy are now independent; any further changes to the table (data or metadata) do
not impact the exported copy. The exported copy can be manipulated/deleted w/o any effect
on the table.</p>
+<ul>
+	
+<li>output directoryg: exports/dept </li>
+	
+<li>_metadata - the metadata file </li>
+	
+<li>data - a directory which now contains all the data files </li>
+	
+</ul>
+<p></p>
+<p>
+<strong>Example 2</strong>
+</p>
+<pre class="code">
+EXPORT TABLE empl TO 'exports/empl'; 
+</pre>
+<p>This example exports the entire table including all the partitions' data/metadata
to the target location.</p>
+<ul>
+
+<li>output directory: exports/empl </li>
+
+<li>_metadata - the metadata file with info on the table as well as the four partitions
below </li>
+
+<li>emp_country=in/emp_state=ka - a directory which now contains all the data files
for in/ka partition </li>
+
+<li>emp_country=in/emp_state=tn - a directory which now contains all the data files
for in/tn partition</li>
+
+<li>emp_country=us/emp_state=ka - a directory which now contains all the data files
for us/ka partition </li>
+
+<li>emp_country=us/emp_state=tn - a directory which now contains all the data files
for us/tn partition</li>
+
+</ul>
+<p></p>
+<p>
+<strong>Example 3</strong>
+</p>
+<pre class="code">
+EXPORT TABLE empl PARTITION (emp_country='in') TO 'exports/empl-in'; 
+</pre>
+<p>This example exports a subset of the partitions - those which have country = in
- to the target location. </p>
+<ul>
+
+<li>output directory: exports/empl </li>
+
+<li>_metadata - the metadata file with info on the table as well as the two partitions
below </li>
+
+<li>emp_country=in/emp_state=ka - a directory which now contains all the data files
for in/ka partition </li>
+
+<li>emp_country=in/emp_state=tn - a directory which now contains all the data files
for in/tn partition </li>
+
+</ul>
+<p></p>
+<p>
+<strong>Example 4</strong>
+</p>
+<pre class="code">
+EXPORT TABLE empl PARTITION (emp_country='in', emp_state='tn') TO 'exports/empl-in';
+</pre>
+<p>This example exports a single partition - that which has country = in, state = tn
- to the target location. </p>
+<ul>
+
+<li>output directory: exports/empl </li>
+
+<li>_metadata - the metadata file with info on the table as well as the partitions
below </li>
+
+<li>emp_country=in/emp_state=tn - a directory which now contains all the data files
for in/tn partition</li>
+
+</ul>
+</div>    
+ 
+ <!-- ==================================================================== -->
+
+<a name="Import+Command"></a>
+<h2 class="h3">Import Command</h2>
+<div class="section">
+<p>Imports a table from a specified location.</p>
+<a name="Syntax-N1016B"></a>
+<h3 class="h4">Syntax</h3>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>IMPORT [[EXTERNAL] TABLE tablename [PARTITION (partcol1=val1, partcol2=val2, ...)]]
FROM 'filepath' [LOCATION 'tablepath']</p>
+            
+</td>
+        
+</tr>
+    
+</table>
+<a name="Terms-N10180"></a>
+<h3 class="h4">Terms</h3>
+<table class="ForrestTable" cellspacing="1" cellpadding="4">
+	    
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>EXTERNAL</p>
+            
+</td>
+            <td colspan="1" rowspan="1">
+               
+<p>Indicates that the imported table is an external table.</p>
+            
+</td>
+        
+</tr>
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>TABLE tablename</p>
+            
+</td>
+            <td colspan="1" rowspan="1">
+               
+<p>The target to be imported, either a table or a partition.</p>
+               
+<p>If the table is partitioned, you can specify a specific partition of the table by
specifying values for all of the partitioning columns, or specify all the (exported) partitions
by not specifying any of the partition parameters in the command. </p>
+            
+</td>
+        
+</tr>
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>PARTITION (partcol=val ...)</p>
+            
+</td>
+            <td colspan="1" rowspan="1">
+               
+<p>The partition column/value specifications.</p>
+            
+</td>
+        
+</tr>         
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>FROM 'filepath'</p>
+            
+</td>
+            <td colspan="1" rowspan="1">
+               
+<p>The filepath (in single quotes) designating the source location the table will be
copied from. The file path can be:</p>
+               
+<ul>
+               
+<li>a relative path ('project/data1') </li>
+               
+<li>an absolute path ('/user/hcat/project/data1') </li>
+               
+<li>a full URI with scheme and, optionally, an authority ('hdfs://namenode:9000/user/hcat/project/data1')
</li>
+               
+</ul>
+            
+</td>
+        
+</tr> 
+        
+<tr>
+            
+<td colspan="1" rowspan="1">
+               
+<p>LOCATION 'tablepath'</p>
+            
+</td>
+            <td colspan="1" rowspan="1">
+               
+<p>(optional) The tablepath (in single quotes) designating the target location the
table will be copied to.</p>
+               
+<p>If not specified, then:</p>
+               
+<ul>
+				 
+<li>For managed tables, the default location of the table within the warehouse/database
directory structure is used. </li>
+				 
+<li>For external tables, the data is imported in-place; that is, no copying takes place.</li>
+			   
+</ul>
+            
+</td>
+        
+</tr> 
+   
+</table>
+<a name="Usage-N10204"></a>
+<h3 class="h4">Usage</h3>
+<p>The IMPORT command imports a table's data and metadata from the specified location.
The table can be a managed table (data and metadata are both removed on drop table/partition)
or an external table (only metadata is removed on drop table/partition). For more information,
see Hive's <a href="https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-Create%2FDropTable">Create/Drop
Table</a>.</p>
+<p>Because the command actually <strong>copies</strong> the files defined
for the table/partions, you should be aware of the following:</p>
+<ul>
+	
+<li>No record level filtering is performed, ordering, etc. is done as part of the import.
</li>
+	
+<li>Since HCatalog only does file-level copies, the data is not transformed in anyway
while performing the export/import. </li>
+	
+<li>You, the user, are responsible for ensuring that the correct binaries are available
in the target environment (compatible serde classes, hcat storage drivers, etc.).</li>
+	
+</ul>
+<p>Also, note the following:</p>
+<ul>
+	
+<li>The filepath should contain the files as created by the export command, or by HCatEximOutputFormat,
or by pig HCatEximStorer. </li>
+	
+<li>Currently only hdfs is supported in production mode for the filesystem. pfile can
be used for testing purposes. </li>
+    
+<li>The target table may or may not exist prior to the import. If it does exist, it
should be compatible with the imported table/command.
+           <ul>
+           
+<li>The column schema and the partitioning schema should match. If partitioned, there
should not be any existing partitions with the same specs as the imported partitions. </li>
+           
+<li>The target table/partition should be empty. </li>
+           
+<li>External/Location checks: 
+           <ul>
+           
+<li>The original table type is ignored on import. You specify the required table type
as part of the command. </li>
+           
+<li>For non-partitioned tables, the new table location as specified by the command
should match the existing table location. </li>
+           
+<li>For partitioned tables, the table type (external/managed) should match. </li>
+           
+<li>For non-partitioned tables imported as external table, you will be asked to the
drop the existing table first. </li>        
+           
+</ul>
+           
+</li>
+           
+<li>The HCatalog storage driver specification should match. </li>
+           
+<li>The serde, sort and bucket specs should match. </li>
+           
+</ul>
+     
+</li>      
+	
+<li>You must have access rights as per the hcat access control mechanisms. </li>
+	
+<li>You should have read access to the source location. </li>
+	
+</ul>
+<a name="Examples-N1025A"></a>
+<h3 class="h4">Examples</h3>
+<p>The examples assume the following tables:</p>
+<ul>
+	
+<li>dept - non partitioned </li>
+    
+<li>empl - partitioned on emp_country, emp_state, has four partitions ("us"/"ka", "us"/"tn",
"in"/"ka", "in"/"tn") </li>
+	
+</ul>
+<p></p>
+<p>
+<strong>Example 1</strong>
+</p>
+<pre class="code">
+IMPORT FROM 'exports/dept'; 
+</pre>
+<p>This example imports the table as a managed target table, default location. The
metadata is stored in the metastore and the table's data files in the warehouse location of
the current database.</p>
+<p></p>
+<p>
+<strong>Example 2</strong>
+</p>
+<pre class="code">
+IMPORT TABLE renamed_name FROM 'exports/dept';
+</pre>
+<p>This example imports the table as a managed target table, default location. The
imported table is given a new name.</p>
+<p></p>
+<p>
+<strong>Example 3</strong>
+</p>
+<pre class="code">
+IMPORT EXTERNAL TABLE name FROM 'exports/dept'; 
+</pre>
+<p>This example imports the table as an external target table, imported in-place. The
metadata is copied to the metastore. </p>
+<p></p>
+<p>
+<strong>Example 4</strong>
+</p>
+<pre class="code">
+IMPORT EXTERNAL TABLE name FROM 'exports/dept' LOCATION 'tablestore/dept';
+</pre>
+<p>This example imports the table as an external target table, imported to another
location. The metadata is copied to the metastore.</p>
+<p></p>
+<p>
+<strong>Example 5</strong>
+</p>
+<pre class="code">
+IMPORT TABLE name FROM 'exports/dept' LOCATION 'tablestore/dept'; 	
+</pre>
+<p>This example imports the table as a managed target table, non-default location.
The metadata is copied to the metastore. </p>
+<p></p>
+<p>
+<strong>Example 6</strong>
+</p>
+<pre class="code">
+IMPORT TABLE empl FROM 'exports/empl'; 	
+</pre>
+<p>This example imports all the exported partitions since the source was a partitioned
table.</p>
+<p></p>
+<p>
+<strong>Example 7</strong>
+</p>
+<pre class="code">
+IMPORT TABLE empl PARTITION (emp_country='in', emp_state='tn') FROM 'exports/empl'; 
+</pre>
+<p>This example imports only the specified partition. </p>
+</div>   
+ 
+  <!-- ==================================================================== -->
+
+<a name="Usage+with+MapReduce"></a>
+<h2 class="h3">Usage with MapReduce</h2>
+<div class="section">
+<p>HCatEximOutputFormat and HCatEximInputFormat can be used in Hadoop environments
where there is no HCatalog instance available. HCatEximOutputFormat can be used to create
an 'exported table' dataset, which later can be imported into a HCatalog instance. It can
also be later read via HCatEximInputFormat or HCatEximLoader. </p>
+<a name="HCatEximOutputFormat"></a>
+<h3 class="h4">HCatEximOutputFormat </h3>
+<pre class="code">
+  public static void setOutput(Job job, String dbname, String tablename, String location,
+      HCatSchema partitionSchema, List&lt;String&gt; partitionValues, HCatSchema
columnSchema) throws HCatException;
+
+  public static void setOutput(Job job, String dbname, String tablename, String location,
+          HCatSchema partitionSchema,
+          List&lt;String&gt; partitionValues,
+          HCatSchema columnSchema,
+          String isdname, String osdname,
+          String ifname, String ofname,
+          String serializationLib) throws HCatException;
+</pre>
+<p>The user can specify the parameters of the table to be created by means of the setOutput
method. The metadata and the data files are created in the specified location. </p>
+<p>The target location must be empty and the user must have write access.</p>
+<a name="HCatEximInputFormat"></a>
+<h3 class="h4">HCatEximInputFormat </h3>
+<pre class="code">
+  public static List&lt;HCatSchema&gt; setInput(Job job,
+      String location,
+      Map&lt;String, String&gt; partitionFilter) throws IOException;
+
+  public static void setOutputSchema(Job job, HCatSchema hcatSchema) throws IOException;
+</pre>
+<p>The user specifies the data collection location and optionally a filter for the
partitions to be loaded via the setInput method. Optionally, the user can also specify the
projection columns via the setOutputSchema method. </p>
+<p>The source location should have the correct layout as for a exported table, and
the user should have read access. </p>
+</div>   
+
+  <!-- ==================================================================== -->
+
+<a name="Usage+with+Pig"></a>
+<h2 class="h3">Usage with Pig</h2>
+<div class="section">
+<p>HCatEximStorer and HCatEximLoader can be used in hadoop/pig environments where there
is no HCatalog instance available. HCatEximStorer can be used to create an 'exported table'
dataset, which later can be imported into a HCatalog instance. It can also be later read via
HCatEximInputFormat or HCatEximLoader. </p>
+<a name="HCatEximStorer"></a>
+<h3 class="h4">HCatEximStorer </h3>
+<pre class="code">
+  public HCatEximStorer(String outputLocation) 
+      throws FrontendException, ParseException;
+  public HCatEximStorer(String outputLocation, String partitionSpec) 
+      throws FrontendException, ParseException;
+  public HCatEximStorer(String outputLocation, String partitionSpec, String schema)
+      throws FrontendException, ParseException;
+</pre>
+<p>The HCatEximStorer is initialized with the output location for the exported table.
Optionally the user can specify the partition specification for the data, plus rename the
schema elements as part of the storer. </p>
+<p>The rest of the storer semantics use the same design as HCatStorer.</p>
+<p>
+<strong>Example</strong>
+</p>
+<pre class="code">
+A = LOAD 'empdata' USING PigStorage(',') 
+    AS (emp_id:int,emp_name:chararray,emp_dob:chararray,emp_sex:chararray,emp_country:chararray,emp_state:chararray);
+INTN = FILTER A BY emp_country == 'IN' AND emp_state == 'TN';
+INKA = FILTER A BY emp_country == 'IN' AND emp_state == 'KA';
+USTN = FILTER A BY emp_country == 'US' AND emp_state == 'TN';
+USKA = FILTER A BY emp_country == 'US' AND emp_state == 'KA';
+STORE INTN INTO 'default.employee' USING org.apache.HCatalog.pig.HCatEximStorer('exim/pigout/employee',
'emp_country=in,emp_state=tn');
+STORE INKA INTO 'default.employee' USING org.apache.HCatalog.pig.HCatEximStorer('exim/pigout/employee',
'emp_country=in,emp_state=ka');
+STORE USTN INTO 'default.employee' USING org.apache.HCatalog.pig.HCatEximStorer('exim/pigout/employee',
'emp_country=us,emp_state=tn');
+STORE USKA INTO 'default.employee' USING org.apache.HCatalog.pig.HCatEximStorer('exim/pigout/employee',
'emp_country=us,emp_state=ka');
+</pre>
+<a name="HCatEximLoader"></a>
+<h3 class="h4">HCatEximLoader </h3>
+<pre class="code">
+public HCatEximLoader();
+</pre>
+<p>The HCatEximLoader is passed the location of the exported table as usual by the
LOAD statement. The loader loads the metadata and data as required from the location. Note
that partition filtering is not done efficiently when eximloader is used; the filtering is
done at the record level rather than at the file level. </p>
+<p>The rest of the loader semantics use the same design as HCatLoader.</p>
+<p>
+<strong>Example</strong>
+</p>
+<pre class="code">
+A = LOAD 'exim/pigout/employee' USING org.apache.HCatalog.pig.HCatEximLoader();
+dump A;
+</pre>
+</div>
+
+  <!-- ==================================================================== -->
+
+<a name="Use+Cases"></a>
+<h2 class="h3">Use Cases</h2>
+<div class="section">
+<p>
+<strong>Use Case 1</strong>
+</p>
+<p>Transfer data between different HCatalog/hadoop instances, with no renaming of tables.</p>
+<ul>
+
+<li>Instance A - HCatalog: export table A into 'locationA'; </li>
+
+<li>Hadoop: distcp hdfs://locationA hdfs://locationB </li>
+
+<li>Instance B - HCatalog: import from 'locationB'; </li>
+
+</ul>
+<p></p>
+<p>
+<strong>Use Case 2</strong>
+</p>
+<p>Transfer data to a hadoop instance which does not have HCatalog and process it there.</p>
+<ul>
+
+<li>Instance A - HCatalog: export table A into 'locationA'; </li>
+
+<li>Hadoop: distcp hdfs://locationA hdfs://locationB </li>
+
+<li>Instance B - Map/Reduce job example 
+</li>
+
+</ul>
+<pre class="code">
+    //job setup
+    ...
+    HCatEximInputFormat.setInput(job, "hdfs://locationB", partitionSpec);
+    job.setInputFormatClass(HCatEximInputFormat.class);
+    ...
+
+    //map setup
+    protected void setup(Context context) throws IOException, InterruptedException {
+      super.setup(context);
+       ...
+       recordSchema = HCatBaseInputFormat.getTableSchema(context);
+       ...
+    }
+
+    //map task
+    public void map(LongWritable key, HCatRecord value, Context context) throws IOException,
+        InterruptedException {
+        ...
+        String colValue = value.getString("emp_name", recordSchema);
+        ...
+    }
+</pre>
+<ul>
+
+<li>Instance B - Pig example 
+</li>
+
+</ul>
+<pre class="code">
+   ...
+   A = LOAD '/user/krishnak/pig-exports/employee-nonpartn' USING org.apache.HCatalog.pig.HCatEximLoader();
+   ...
+</pre>
+<p></p>
+<p>
+<strong>Use Case 3</strong>
+</p>
+<p>Create an exported dataset in a hadoop instance which does not have HCatalog and
then import into HCatalog in a different instance.</p>
+<ul>
+
+<li>Instance A - Map/Reduce job example </li>
+
+</ul>
+<pre class="code">
+    //job setup
+    ...
+    List&lt;HCatFieldSchema&gt; columns = new ArrayList&lt;HCatFieldSchema&gt;();
+    columns.add(HCatSchemaUtils.getHCatFieldSchema(new FieldSchema("emp_id",
+        Constants.INT_TYPE_NAME, "")));
+    ...
+    List&lt;HCatFieldSchema&gt; partKeys = new ArrayList&lt;HCatFieldSchema&gt;();
+    partKeys.add(HCatSchemaUtils.getHCatFieldSchema(new FieldSchema("emp_country",
+        Constants.STRING_TYPE_NAME, "")));
+    partKeys.add(HCatSchemaUtils.getHCatFieldSchema(new FieldSchema("emp_state",
+        Constants.STRING_TYPE_NAME, "")));
+    HCatSchema partitionSchema = new HCatSchema(partKeys);
+    List&lt;String&gt; partitionVals = new ArrayList&lt;String&gt;();
+    partitionVals.add(...);
+    partitionVals.add(...);
+    ...
+    HCatEximOutputFormat.setOutput(job, "default", "employee", "hdfs:/user/krishnak/exim/employee",
+        partitionSchema, partitionVals, new HCatSchema(columns));
+    job.setOutputFormatClass(HCatEximOutputFormat.class);
+    ...
+
+    //map setup
+    protected void setup(Context context) throws IOException, InterruptedException {
+      super.setup(context);
+       ...
+       recordSchema = HCatEximOutputFormat.getTableSchema(context);
+       ...
+    }
+
+    //map task
+    public void map(LongWritable key, HCatRecord value, Context context) throws IOException,
+        InterruptedException {
+        ...
+        HCatRecord record = new DefaultHCatRecord(recordSchema.size());
+        record.setInteger("emp_id", recordSchema, Integer.valueOf(cols[0]));
+        record.setString("emp_name", recordSchema, cols[1]);
+        ...
+        context.write(key, record);
+        ...
+    }
+</pre>
+<ul>
+
+<li>Instance A - Pig example </li>
+
+</ul>
+<pre class="code">
+   ...
+STORE INTN INTO 'default.employee' 
+   USING org.apache.HCatalog.pig.HCatEximStorer('/user/krishnak/pig-exports/employee', 'emp_country=IN,emp_state=TN');
+   ...
+</pre>
+<ul>
+
+<li>Hadoop: distcp hdfs://locationA hdfs://locationB </li>
+
+<li>Instance B - HCatalog: import from 'locationB'; </li>
+
+</ul>
+</div>
+
+  
+</div>
+<!--+
+    |end content
+    +-->
+<div class="clearboth">&nbsp;</div>
+</div>
+<div id="footer">
+<!--+
+    |start bottomstrip
+    +-->
+<div class="lastmodified">
+<script type="text/javascript"><!--
+document.write("Last Published: " + document.lastModified);
+//  --></script>
+</div>
+<div class="copyright">
+        Copyright &copy;
+         2011 <a href="http://www.apache.org/licenses/">The Apache Software Foundation</a>
+</div>
+<!--+
+    |end bottomstrip
+    +-->
+</div>
+</body>
+</html>

Added: incubator/hcatalog/site/publish/docs/r0.2.0/importexport.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/docs/r0.2.0/importexport.pdf?rev=1178503&view=auto
==============================================================================
Binary file - no diff available.

Propchange: incubator/hcatalog/site/publish/docs/r0.2.0/importexport.pdf
------------------------------------------------------------------------------
    svn:mime-type = application/octet-stream

Modified: incubator/hcatalog/site/publish/index.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/index.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/index.pdf (original) and incubator/hcatalog/site/publish/index.pdf
Mon Oct  3 18:32:27 2011 differ

Modified: incubator/hcatalog/site/publish/issue_tracking.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/issue_tracking.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/issue_tracking.pdf (original) and incubator/hcatalog/site/publish/issue_tracking.pdf
Mon Oct  3 18:32:27 2011 differ

Modified: incubator/hcatalog/site/publish/linkmap.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/linkmap.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/linkmap.pdf (original) and incubator/hcatalog/site/publish/linkmap.pdf
Mon Oct  3 18:32:27 2011 differ

Modified: incubator/hcatalog/site/publish/mailing_lists.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/mailing_lists.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/mailing_lists.pdf (original) and incubator/hcatalog/site/publish/mailing_lists.pdf
Mon Oct  3 18:32:27 2011 differ

Modified: incubator/hcatalog/site/publish/privacypolicy.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/privacypolicy.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/privacypolicy.pdf (original) and incubator/hcatalog/site/publish/privacypolicy.pdf
Mon Oct  3 18:32:27 2011 differ

Modified: incubator/hcatalog/site/publish/releases.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/releases.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/releases.pdf (original) and incubator/hcatalog/site/publish/releases.pdf
Mon Oct  3 18:32:27 2011 differ

Modified: incubator/hcatalog/site/publish/version_control.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/version_control.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/version_control.pdf (original) and incubator/hcatalog/site/publish/version_control.pdf
Mon Oct  3 18:32:27 2011 differ

Modified: incubator/hcatalog/site/publish/whoweare.pdf
URL: http://svn.apache.org/viewvc/incubator/hcatalog/site/publish/whoweare.pdf?rev=1178503&r1=1178502&r2=1178503&view=diff
==============================================================================
Files incubator/hcatalog/site/publish/whoweare.pdf (original) and incubator/hcatalog/site/publish/whoweare.pdf
Mon Oct  3 18:32:27 2011 differ



Mime
View raw message