hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [03/50] incubator-hawq-docs git commit: more content and rearranging of pxf hdfs plugin page
Date Mon, 31 Oct 2016 22:13:13 GMT
more content and rearranging of pxf hdfs plugin page


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/2da7a92a
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/2da7a92a
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/2da7a92a

Branch: refs/heads/tutorial-proto
Commit: 2da7a92a3e8431335a48005d55a70c9eba333e16
Parents: 9ca2779
Author: Lisa Owen <lowen@pivotal.io>
Authored: Mon Oct 17 16:27:23 2016 -0700
Committer: Lisa Owen <lowen@pivotal.io>
Committed: Mon Oct 17 16:27:23 2016 -0700

----------------------------------------------------------------------
 pxf/HDFSFileDataPXF.html.md.erb | 331 ++++++++---------------------------
 1 file changed, 76 insertions(+), 255 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/2da7a92a/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index e1c621f..e49688e 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -17,27 +17,27 @@ Before working with HDFS file data using HAWQ and PXF, ensure that:
 
 The PXF HDFS plug-in supports the following file formats:
 
-- TextFile - comma-separated value or delimited format plain text file
+- TextFile - comma-separated value (.csv) or delimited format plain text file
 - SequenceFile - flat file consisting of binary key/value pairs
 - Avro - JSON-defined, schema-based data serialization format
 
 The PXF HDFS plug-in includes the following profiles to support the file formats listed above:
 
-- `HdfsTextSimple`
-- `HdfsTextMulti`
-- `SequenceWritable`
-- `Avro`
+- `HdfsTextSimple` - text files
+- `HdfsTextMulti` - text files with embedded line feeds
+- `SequenceWritable` - SequenceFile
+- `Avro` - Avro files
 
 ## <a id="hdfsplugin_datatypemap"></a>Data Type Mapping
 jjj
 
 
 ## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
-HAWQ includes command-line tools that interact directly with HDFS.  These tools support typical
file system operations including copying, listing, changing file permissions, etc. 
+Hadoop includes command-line tools that interact directly with HDFS.  These tools support
typical file system operations including copying and listing files, changing file permissions,
etc. 
 
 The HDFS file system command is `hdfs dfs <options> [<file>]`. Invoked with no
options, `hdfs dfs` lists the file system options supported by the tool.
 
-`hdfs dfs` options used in this section are listed in the table below:
+`hdfs dfs` options used in this section are identified in the table below:
 
 | Option  | Description |
 |-------|-------------------------------------|
@@ -57,7 +57,7 @@ Create a delimited plain text file:
 $ vi /tmp/pxf_hdfs_ts.txt
 ```
 
-Add the following data to `pxf_hdfs_ts.txt`:
+Copy and paste the following data into `pxf_hdfs_ts.txt`:
 
 ``` pre
 Prague,Jan,101,4875.33
@@ -66,7 +66,7 @@ Bangalore,May,317,8936.99
 Beijing,Jul,411,11600.67
 ```
 
-Notice the use of the comma `,` to separate field values.
+Notice the use of the comma `,` to separate the four data fields.
 
 Add the data file to HDFS:
 
@@ -86,7 +86,7 @@ Create a second delimited plain text file:
 $ vi /tmp/pxf_hdfs_tm.txt
 ```
 
-Add the following data to `pxf_hdfs_tm.txt`:
+Copy/paste the following data into `pxf_hdfs_tm.txt`:
 
 ``` pre
 "4627 Star Rd.
@@ -101,7 +101,7 @@ Chicago, IL  60605":Jul:2017
 Columbus, OH  43213":Dec:2018
 ```
 
-Notice the use of the colon `:` to separate field values. Also notice the quotes around the
first/address field. This field includes an embedded line feed.
+Notice the use of the colon `:` to separate the three fields. Also notice the quotes around
the first (address) field. This field includes an embedded line feed.
 
 Add the data file to HDFS:
 
@@ -112,7 +112,7 @@ $ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_tm.txt /data/pxf_examples/
 You will use these HDFS files in later sections.
 
 ## <a id="hdfsplugin_queryextdata"></a>Querying External HDFS Data
-The PXF HDFS plug-in supports several HDFS-related profiles. These include `HdfsTextSimple`,
`HdfsTextMulti`, `SequenceWritable`, and `Avro`.
+The PXF HDFS plug-in supports several profiles. These include `HdfsTextSimple`, `HdfsTextMulti`,
`SequenceWritable`, and `Avro`.
 
 Use the following syntax to create a HAWQ external table representing HDFS data: 
 
@@ -126,24 +126,29 @@ FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
 
 HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html)
call are described in the table below.
 
-**Note**: Some profile-specific options and properties may be discussed in the relevant profile
section later in this topic.
-
 | Keyword  | Value |
 |-------|-------------------------------------|
-| host    | The HDFS NameNode. |
-| \<path-to-hdfs-file\>    | path to the file in the HDFS data store |
-| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`,
`SequenceWritable` or `Avro`. |
-| \<custom-option\>  | \<custom-option\> is profile-specific. |
-| FORMAT 'TEXT' | Use `TEXT` `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\>
references a plain text delimited file.  |
-| FORMAT 'CSV' | Use `CSV` `FORMAT` with `HdfsTextSimple` and `HdfsTextMulti` profiles when
\<path-to-hdfs-file\> references a comma-separated value file.  |
-| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with `Avro` and `SequenceWritable` profiles.
The `CUSTOM` format supports only the built-in `formatter='pxfwritable_export'` \<formatting-property\>.
 |
-| \<formatting-properties\> | \<formatting-properties\> are profile-specific.
|
+| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
+| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
+| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`,
`SequenceWritable`, or `Avro`. |
+| \<custom-option\>  | \<custom-option\> is profile-specific. Profile-specific
options are discussed in the relevant profile topic later in this section.|
+| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\>
references a plain text delimited file.  |
+| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` and `HdfsTextMulti` profiles
when \<path-to-hdfs-file\> references a comma-separated value file.  |
+| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with `Avro` and `SequenceWritable` profiles.
The '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export')` \<formatting-property\>
|
+ \<formatting-properties\>    | \<formatting-properties\> are profile-specific.
Profile-specific formatting options are discussed in the relevant profile topic later in this
section. |
+
+*Note*: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT`
specification.
 
-Note: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT`
specification.
+## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
 
-### <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
+Use the `HdfsTextSimple` profile when reading plain text delimited or .csv files where each
row is a single record.
+
+\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| delimiter    | The delimiter character in the file. Default value is a comma `,`.|
 
-Use the `HdfsTextSimple` profile when reading plain text delimited or csv files where each
row is a single record.
 
 The following SQL call uses the PXF `HdfsTextSimple` profile to create a queryable HAWQ external
table from the `pxf_hdfs_ts.txt` file you created and added to HDFS in an earlier section:
 
@@ -173,12 +178,18 @@ gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text,
month tex
 gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
 ```
 
-When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter` formatter
option is required, as comma is the default delimiter.
+Notice: When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter` formatter
option is required, as comma is the default.
 
-### <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
+## <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
 
 Use the `HdfsTextMulti` profile when reading plain text files with delimited single- or multi-
line records that include embedded (quoted) linefeed characters.
 
+\<formatting-properties\> supported by the `HdfsTextMulti` profile include:
+
+| Keyword  | Value |
+|-------|-------------------------------------|
+| delimiter    | The delimiter character in the file. |
+
 The following SQL call uses the PXF `HdfsTextMulti` profile to create a queryable HAWQ external
table from the `pxf_hdfs_tm.txt` file you created and added to HDFS in an earlier section:
 
 ``` sql
@@ -204,25 +215,28 @@ gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
 (5 rows)
 ```
 
-### <a id="profile_hdfsseqwritable"></a>SequenceWritable Profile 
+## <a id="profile_hdfsseqwritable"></a>SequenceWritable Profile 
 
 Use the `SequenceWritable` profile when reading SequenceFile format files. Files of this
type consist of binary key/value pairs. Sequence files are a common data transfer format between
MapReduce jobs. 
 
-The `SequenceWritable` profile supports the following \<custom-options\> or \<formatting-properties\>:
+The `SequenceWritable` profile supports the following \<custom-options\>:
 
 | Keyword  | Value Description |
 |-------|-------------------------------------|
-| COMPRESSION_CODEC    | Specifies the compression codec Java class name  |
+| COMPRESSION_CODEC    | The compression codec Java class name.|
 | COMPRESSION_TYPE    | The compression type of the sequence file; supported values are `RECORD`
(the default) or `BLOCK`. |
-| \<path-to-hdfs-file\>    | path to the file in the HDFS data store |
+| DATA-SCHEMA    | The name of the writer serialization class. The jar file in which this
class resides must be in the PXF class path. This option has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode.
Default value is `TRUE` - requests can run in multi-thread mode. When set to `FALSE`, requests
will be handled in a single thread. |
 
-MORE HERE
+???? MORE HERE
 
-### <a id="profile_hdfsavro"></a>Avro Profile
+??? ADDRESS SERIALIZATION
+
+## <a id="profile_hdfsavro"></a>Avro Profile
 
 Avro files store metadata with the data. Avro files also allow specification of an independent
schema used when reading the file. 
 
-#### <a id="profile_hdfsavrodatamap"></a>Data Type Mapping
+### <a id="profile_hdfsavrodatamap"></a>Data Type Mapping
 
 To represent Avro data in HAWQ, map data values that use a primitive data type to HAWQ columns
of the same type. 
 
@@ -230,8 +244,6 @@ Avro supports complex data types including arrays, maps, records, enumerations,
 
 The following table summarizes external mapping rules for Avro data.
 
-<caption><span class="tablecap">Table 2. Avro Data Type Mapping</span></caption>
-
 <a id="topic_oy3_qwm_ss__table_j4s_h1n_ss"></a>
 
 | Avro Data Type                                                    | PXF Type          
                                                                                         
                                                                                       |
@@ -241,25 +253,23 @@ The following table summarizes external mapping rules for Avro data.
 | Complex type: Fixed                                               | BYTEA             
                                                                                         
                                                                                       |
 | Union                                                             | Follows the above conventions
for primitive or complex data types, depending on the union; supports Null values.       
                                                             |
 
-#### <a id="profile_hdfsavroptipns"></a>Avro-Specific Formatting Options
+### <a id="profile_hdfsavroptipns"></a>Avro-Specific Custom Options
 
 For complex types, the PXF Avro profile inserts default delimiters between collection items
and values. You can use non-default delimiter characters by identifying values for specific
Avro custom options in the `CREATE EXTERNAL TABLE` call. 
 
-The Avro profile supports the following custom options:
-
-<caption><span class="tablecap">Table N. Avro Formatting Options</span></caption>
+The Avro profile supports the following \<custom-options\>:
 
 | Option Name   | Description       
 |---------------|--------------------|                                                  
                                     
 | COLLECTION_DELIM | The delimiter character(s) to place between entries in a top-level array,
map, or record field when PXF maps a Avro complex data type to a text column. The default
is a comma `,` character. |
 | MAPKEY_DELIM | The delimiter character(s) to place between the key and value of a map entry
when PXF maps an Avro complex data type to a text column. The default is a colon `:` character.
|
-| RECORDKEY_DELIM | The delimiter character(s) to place between the field name and value
of a record entry when PXF maps an Avro complex data type to a text colum. The default is
a colon `:` character. |
-| SCHEMA-DATA | The data schema file used to create and read the HDFS file. For an Avro
file, you may create an avsc. This option has no default value. |
-| THREAD-SAFE | Determines if the table query can run in multi-thread mode or not. Allowed
values are `TRUE`, `FALSE`. Default value is `TRUE` - requests can run in multithread mode.
When set to FALSE, requests will be handled in a single thread. |
+| RECORDKEY_DELIM | The delimiter character(s) to place between the field name and value
of a record entry when PXF maps an Avro complex data type to a text column. The default is
a colon `:` character. |
+| SCHEMA-DATA | The data schema file used to create and read the HDFS file. This option
has no default value. |
+| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode.
Default value is `TRUE` - requests can run in multi-thread mode. When set to `FALSE`, requests
will be handled in a single thread. |
 
-#### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
+### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
 
-Avro schemas are defined using JSON, and composed of the same primitive and complex types
identified in the data mapping section above. Avro schema file typically have a `.avsc` suffix.
+Avro schemas are defined using JSON, and composed of the same primitive and complex types
identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix.
 
 Fields in an Avro schema file are defined via an array of objects, each of which is specified
by a name and a type. The field type is another schema object
 
@@ -324,7 +334,7 @@ Copy and paste the following text into `avro_schema.avsc`:
 
 An Avro schema, together with its data, is fully self-describing.  
 
-#### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Sample Avro Data (JSON)
+### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Sample Avro Data (JSON)
 
 Create a text file named `pxf_hdfs_avro.txt`:
 
@@ -342,7 +352,7 @@ Enter the following data into `pxf_hdfs_avro.txt`:
 
 The sample data uses a comma `,` to separate top level records and a colon `:` to separate
map/key values and record field name/values.
 
-Convert the text file to Avro format. There are various ways to perform the conversion programmatically
and via the command line. In this example, we are using the [Java Avro tools](http://avro.apache.org/releases.html)
and the jar file resides in the current directory:
+Convert the text file to Avro format. There are various ways to perform the conversion programmatically
and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html);
the jar file resides in the current directory:
 
 ``` shell
 $ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt
> /tmp/pxf_hdfs_avro.avro
@@ -397,173 +407,14 @@ gpadmin=# SELECT username, address FROM followers_view WHERE followers
@> '{john
  jim      | {number:9,street:deer creek,city:palo alto}
 ```
 
-~~XXXXX
-
-
-### <a id="resolver"></a>Resolver
-
-Choose the Resolver format if data records are serialized in the HDFS file. 
-
-**Note:** You must include a Profile or a Resolver in the table definition.
-
-<table>
-<colgroup>
-<col width="33%" />
-<col width="33%" />
-<col width="33%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th>Record Serialization</th>
-<th>Resolver</th>
-<th>Comments</th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>Avro</td>
-<td>org.apache.hawq.pxf.plugins. hdfs.AvroResolver</td>
-<td><ul>
-<li>Avro files include the record schema, Avro serialization can be used in other file
types (e.g, Sequence File). </li>
-<li>For Avro serialized records outside of an Avro file, include a schema file name
(.avsc) in the url under the optional <code class="ph codeph">Schema-Data </code>option.</li>
-<li>Deserialize Only (Read) .</li>
-</ul></td>
-</tr>
-<tr class="even">
-<td>Java Writable</td>
-<td>org.apache.hawq.pxf.plugins. hdfs.WritableResolver</td>
-<td><ul>
-<li>Include the name of the Java class that uses Writable serialization in the URL
under the optional <code class="ph codeph">Schema-Data.</code></li>
-<li>The class file must exist in the public stage directory (or in Hadoop's class path).</li>
-<li>Deserialize and Serialize (Read + Write). </li>
-<li>See <a href="#customizedwritableschemafileguidelines">Customized Writable
Schema File Guidelines</a>.</li>
-</ul></td>
-</tr>
-<tr class="odd">
-<td>None (plain text)</td>
-<td>org.apache.hawq.pxf.plugins. hdfs.StringPassResolver</td>
-<td><ul>
-<li>Does not serialize plain text records. The database parses plain records. Passes
records as they are.</li>
-<li>Deserialize and Serialize (Read + Write).</li>
-</ul></td>
-</tr>
-</tbody>
-</table>
-
-#### <a id="customizedwritableschemafileguidelines"></a>Schema File Guidelines
for WritableResolver
-
-When using a WritableResolver, a schema file needs to be defined. The file needs to be a
Java class file and must be on the class path of PXF.
-
-The class file must follow the following requirements:
-
-1.  Must implement org.apache.hadoop.io.Writable interface.
-2.  WritableResolver uses reflection to recreate the schema and populate its fields (for
both read and write). Then it uses the Writable interface functions to read/write. Therefore,
fields must be public, to enable access to them. Private fields will be ignored.
-3.  Fields are accessed and populated in the order in which they are declared in the class
file.
-4.  Supported field types:
-    -   boolean
-    -   byte array
-    -   double
-    -   float
-    -   int
-    -   long
-    -   short
-    -   string
-
-    Arrays of any of the above types are supported, but the constructor must define the array
size so the reflection will work.
-
-### <a id="additionaloptions"></a>Additional Options
-
-<a id="additionaloptions__table_skq_kpz_4p"></a>
-
-<table>
-<caption><span class="tablecap">Table 1. Additional PXF Options</span></caption>
-<colgroup>
-<col width="50%" />
-<col width="50%" />
-</colgroup>
-<thead>
-<tr class="header">
-<th>Option Name</th>
-<th>Description</th>
-</tr>
-</thead>
-<tbody>
-<tr class="odd">
-<td>COLLECTION_DELIM</td>
-<td>(Avro or Hive profiles only.) The delimiter character(s) to place between entries
in a top-level array, map, or record field when PXF maps a Hive or Avro complex data type
to a text column. The default is a &quot;,&quot; character.</td>
-</tr>
-<tr class="even">
-<td>COMPRESSION_CODEC</td>
-<td><ul>
-<li>Useful for WRITABLE PXF tables.</li>
-<li>Specifies the compression codec class name for compressing the written data. The
class must implement the org.apache.hadoop.io.compress.CompressionCodec interface.</li>
-<li> Some valid values are org.apache.hadoop.io.compress.DefaultCodec org.apache.hadoop.io.compress.GzipCodec
org.apache.hadoop.io.compress.BZip2Codec.</li>
-<li>Note: org.apache.hadoop.io.compress.BZip2Codec runs in a single thread and can
be slow.</li>
-<li>This option has no default value. </li>
-<li>When the option is not defined, no compression will be done.</li>
-</ul></td>
-</tr>
-<tr class="odd">
-<td>COMPRESSION_TYPE</td>
-<td><ul>
-<li>Useful WRITABLE PXF tables with SequenceFileAccessor.</li>
-<li>Ignored when COMPRESSION_CODEC is not defined.</li>
-<li>Specifies the compression type for sequence file.</li>
-<li>Valid options are: 
-<ul>
-<li>RECORD - only the value part of each row is compressed.</li>
-<li>BLOCK - both keys and values are collected in 'blocks' separately and compressed.</li>
-</ul></li>
-<li>Default value: RECORD.</li>
-</ul></td>
-</tr>
-<tr class="even">
-<td>MAPKEY_DELIM</td>
-<td>(Avro or Hive profiles only.) The delimiter character(s) to place between the key
and value of a map entry when PXF maps a Hive or Avro complex data type to a text colum. The
default is a &quot;:&quot; character.</td>
-</tr>
-<tr class="odd">
-<td>RECORDKEY_DELIM</td>
-<td>(Avro profile only.) The delimiter character(s) to place between the field name
and value of a record entry when PXF maps an Avro complex data type to a text colum. The default
is a &quot;:&quot; character.</td>
-</tr>
-<tr class="even">
-<td>SCHEMA-DATA</td>
-<td>The data schema file used to create and read the HDFS file. For example, you could
create an avsc (for Avro), or a Java class (for Writable Serialization) file. Make sure that
you have added any JAR files containing the schema to <code class="ph codeph">pxf-public.classpath</code>.
-<p>This option has no default value.</p></td>
-</tr>
-<tr class="odd">
-<td>THREAD-SAFE</td>
-<td>Determines if the table query can run in multithread mode or not. When set to FALSE,
requests will be handled in a single thread.
-<p>Should be set when a plug-in or other elements that are not thread safe are used
(e.g. compression codec).</p>
-<p>Allowed values: TRUE, FALSE. Default value is TRUE - requests can run in multithread
mode.</p></td>
-</tr>
-<tr class="even">
-<td> &lt;custom&gt;</td>
-<td>Any option added to the pxf URI string will be accepted and passed, along with
its value, to the Fragmenter, Accessor, and Resolver implementations.</td>
-</tr>
-</tbody>
-</table>
-
-## <a id="accessingdataonahighavailabilityhdfscluster"></a>Accessing HDFS Data
in a High Availability HDFS Cluster
-
-To access data in a High Availability HDFS cluster, change the \<host\> provided
in the URI LOCATION clause. Use *HA\_nameservice* rather than  *name\_node\_host:51200*.
 
-``` sql
-CREATE EXTERNAL TABLE <table_name> 
-    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
-LOCATION ('pxf://<HA-nameservice>/<path-to-hdfs-file>
-    ?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro|SequenceWritable[&<custom-option>=<value>[...]]')
-FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
-```
+## <a id="recordkeyinkey-valuefileformats"></a>Reading the Record Key 
 
-The opposite is true when a highly available HDFS cluster is reverted to a single namenode
configuration. In that case, any table definition that has the \<HA-nameservice\> specified
should use the \<host\>[:\<port\>] syntax. 
+Sequence file and other file formats that store rows in a key-value format can access the
key value through HAWQ by using the `recordkey` keyword as a field name.
 
-## <a id="recordkeyinkey-valuefileformats"></a>Using a Record Key with Key-Value
File Formats
+The field type of `recordkey` must correspond to the key type, much as the other fields must
match the HDFS data. 
 
-For sequence file and other file formats that store rows in a key-value format, the key value
can be accessed through HAWQ by using the saved keyword '`recordkey`' as a field name.
-
-The field type must correspond to the key type, much as the other fields must match the HDFS
data. 
-
-WritableResolver supports read and write of recordkey, which can be of the following Writable
Hadoop types:
+`recordkey` can be of the following Hadoop types:
 
 -   BooleanWritable
 -   ByteWritable
@@ -573,59 +424,29 @@ WritableResolver supports read and write of recordkey, which can be
of the follo
 -   LongWritable
 -   Text
 
-If the `recordkey` field is not defined, the key is ignored in read, and a default value
(segment id as LongWritable) is written in write.
-
 ### <a id="example1"></a>Example
 
-A data schema `Babies.class` contains three fields: (name text, birthday text, weight float). An
external table must include these three fields, and can either include or ignore the recordkey.
+A data schema `Babies.class` contains three fields: name (text), birthday (text), weight
(float). An external table definition for this schema must include these three fields, and
can either include or ignore the `recordkey`.
 
 ``` sql
--- writable table with recordkey
-CREATE WRITABLE EXTERNAL TABLE babies_registry (recordkey int, name text, birthday text,
weight float)
-  LOCATION ('pxf://namenode_host:51200/babies_1940s'
-            '?ACCESSOR=org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor'
-            '&RESOLVER=org.apache.hawq.pxf.plugins.hdfs.WritableResolver'
-            '&DATA-SCHEMA=Babies')
-  FORMAT 'CUSTOM' (formatter='pxfwritable_export');
-INSERT INTO babies_registry VALUES (123456, "James Paul McCartney", "June 18, 1942", 3.800);
-
--- writable table without recordkey
-CREATE WRITABLE EXTERNAL TABLE babies_registry2 (name text, birthday text, weight float)
-  LOCATION ('pxf://namenode_host:51200/babies_1940s'
-            '?ACCESSOR=org.apache.hawq.pxf.plugins.SequenceFileAccessor'
-            '&RESOLVER=org.apache.hawq.pxf.plugins.WritableResolver'
-            '&DATA-SCHEMA=Babies')
-  FORMAT 'CUSTOM' (formatter='pxfwritable_export');
-
--- this record's key will have some default value
-INSERT INTO babies_registry VALUES ("Richard Starkey", "July 7, 1940", 4.0);
+gpadmin=# CREATE EXTERNAL TABLE babies_1940 (recordkey int, name text, birthday text, weight
float)
+            LOCATION ('pxf://namenode:51200/babies_1940s?PROFILE=SequenceWritable&DATA-SCHEMA=Babies')
+          FORMAT 'CUSTOM' (formatter='pxfwritable_import');
+gpadmin=# SELECT * FROM babies_1940;
 ```
 
-The same goes for reading data from an existing file with a key-value format, e.g. a Sequence
file.
+## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High Availability
HDFS Cluster
+
+To access external HDFS data in a High Availability HDFS cluster, change the URI LOCATION
clause to use \<HA-nameservice\> rather than  \<host\>[:\<port\>].
 
 ``` sql
--- readable table with recordkey
-CREATE EXTERNAL TABLE babies_1940 (recordkey int, name text, birthday text, weight float)
-  LOCATION ('pxf://namenode_host:51200/babies_1940s'
-            '?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter'
-            '&ACCESSOR=org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor'
-            '&RESOLVER=org.apache.hawq.pxf.plugins.hdfs.WritableResolver'
-            '&DATA-SCHEMA=Babies')
-  FORMAT 'CUSTOM' (formatter='pxfwritable_import');
--- retrieve each record's key
-SELECT * FROM babies_1940;
-
--- readable table without recordkey
-CREATE EXTERNAL TABLE babies_1940_2 (name text, birthday text, weight float)
-  LOCATION ('pxf://namenode_host:51200/babies_1940s'
-            '?FRAGMENTER=org.apache.hawq.pxf.plugins.hdfs.HdfsDataFragmenter'
-            '&ACCESSOR=org.apache.hawq.pxf.plugins.hdfs.SequenceFileAccessor'
-            '&RESOLVER=org.apache.hawq.pxf.plugins.hdfs.WritableResolver'
-            '&DATA-SCHEMA=Babies') 
-  FORMAT 'CUSTOM' (formatter='pxfwritable_import');
--- ignores the records' key
-SELECT * FROM babies_1940_2; 
+gpadmin=# CREATE EXTERNAL TABLE <table_name> ( <column_name> <data_type>
[, ...] | LIKE <other_table> )
+            LOCATION ('pxf://<HA-nameservice>/<path-to-hdfs-file>?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro|SequenceWritable[&<custom-option>=<value>[...]]')
+         FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
 ```
 
+The opposite is true when a highly available HDFS cluster is reverted to a single NameNode
configuration. In that case, any table definition that has the \<HA-nameservice\> specified
should use the \<host\>[:\<port\>] syntax. 
+
+
 ## <a id="hdfs_advanced"></a>Advanced
-If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose
to create a custom HDFS profile from the existing HDFS Accessors and Resolvers. Refer to [XX]()
for information on
\ No newline at end of file
+If you find that the pre-defined PXF HDFS profiles do not meet your needs, you may choose
to create a custom HDFS profile from the existing HDFS Accessors and Resolvers. Refer to [Adding
and Updating Profiles](ReadWritePXF.html#addingandupdatingprofiles) for information on creating
a custom profile.
\ No newline at end of file


Mime
View raw message