hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From kavinderd <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...
Date Fri, 13 Jan 2017 22:26:16 GMT
Github user kavinderd commented on a diff in the pull request:

    --- Diff: markdown/pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,416 @@
    +title: Writing Data to HDFS
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and
`SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ
internal table to binary or text HDFS files.
    +Use the `HdfsTextSimple` profile when writing text data. Use the `SequenceWritable` profile
when dealing with binary data.
    +This section describes how to use these PXF profiles to create writable external tables.
    +**Note**: Tables that you create with writable profiles can only be used for INSERT operations.
 If you want to query inserted data, you must define a separate external readable table that
references the new HDFS file using the equivalent readable profile.  ??You can also create
a Hive table to access the HDFS file.??
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html)
for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services.
    +-   HDFS write permissions are provided to a restricted set of users.
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports two writable profiles: `HdfsTextSimple` and `SequenceWritable`.
    +Use the following syntax to create a HAWQ external writable table representing HDFS data: 
    +``` sql
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html)
call are described in the table below.
    +| Keyword  | Value |
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`
or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options
are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile to create a
plain-text-delimited file at the location specified by \<path-to-hdfs-file\>. The `HdfsTextSimple`
'`TEXT`' `FORMAT` supports only the built-in `(delimiter=<delim>)` \<formatting-property\>.
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with the `HdfsTextSimple` profile to create a comma-separated-value
file at the location specified by \<path-to-hdfs-file\>.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile.
The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)`
(write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your
`FORMAT` specification.
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following custom options:
    +| Option  | Value Description | Profile |
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not
provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`
and `org.apache.hadoop.io.compress.BZip2Codec` | HdfsTextSimple, SequenceWritable |
    +|    |  `org.apache.hadoop.io.compress.GzipCodec` | HdfsTextSimple |
    --- End diff --
    Is the formatting of this correct for `org.apache.hadoop.io.compress.GzipCode`? It seems
like it should be a part of the row above

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message