hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF...
Date Mon, 31 Oct 2016 22:25:51 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85845526
  
    --- Diff: pxf/HDFSWritablePXF.html.md.erb ---
    @@ -0,0 +1,410 @@
    +---
    +title: Writing Data to HDFS
    +---
    +
    +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and
`SequenceWritable` profiles.  You might create a writable table to export data from a HAWQ
internal table to HDFS.
    +
    +This section describes how to use these PXF profiles to create writable external tables.
    +
    +**Note**: You cannot directly query data in a HAWQ writable table.  After creating the
external writable table, you must create a HAWQ readable external table accessing the HDFS
file, then query that table. ??You can also create a Hive table to access the HDFS file.??
    +
    +## <a id="pxfwrite_prereq"></a>Prerequisites
    +
    +Before working with HDFS file data using HAWQ and PXF, ensure that:
    +
    +-   The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html)
for PXF plug-in installation information.
    +-   All HDFS users have read permissions to HDFS services and that write permissions
have been restricted to specific users.
    +
    +## <a id="hdfsplugin_writeextdata"></a>Writing to PXF External Tables
    +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`.
    +
    +Use the following syntax to create a HAWQ external writable table representing HDFS data: 
    +
    +``` sql
    +CREATE EXTERNAL WRITABLE TABLE <table_name> 
    +    ( <column_name> <data_type> [, ...] | LIKE <other_table> )
    +LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
    +    ?PROFILE=HdfsTextSimple|SequenceWritable[&<custom-option>=<value>[...]]')
    +FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
    +```
    +
    +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html)
call are described in the table below.
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| \<host\>[:\<port\>]    | The HDFS NameNode and port. |
    +| \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
    +| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`
or `SequenceWritable`. |
    +| \<custom-option\>  | \<custom-option\> is profile-specific. These options
are discussed in the next topic.|
    +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\>
will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports
only the built-in `(delimiter=<delim>)` \<formatting-property\>. |
    +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \<path-to-hdfs-file\>
will reference a comma-separated value file.  |
    +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile.
The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)`
(write) and `(formatter='pxfwritable_import)` (read) \<formatting-properties\>.
    +
    +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your
`FORMAT` specification.
    +
    +## <a id="profile_hdfstextsimple"></a>Custom Options
    +
    +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \<custom-options\>:
    +
    +| Keyword  | Value Description |
    +|-------|-------------------------------------|
    +| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not
provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`,
`org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec`
(`HdfsTextSimple` profile only) |
    +| COMPRESSION_TYPE    | The compression type to employ; supported values are `RECORD`
(the default) or `BLOCK`. |
    +| DATA-SCHEMA    | (`SequenceWritable` profile only) The name of the writer serialization/deserialization
class. The jar file in which this class resides must be in the PXF class path. This option
has no default value. |
    +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode.
Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests
will be handled in a single thread.  `THREAD-SAFE` should be set appropriately when operations
that are not thread-safe are performed (i.e. compression). |
    +
    +## <a id="profile_hdfstextsimple"></a>HdfsTextSimple Profile
    +
    +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where
each row is a single record.
    +
    +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block
compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs
are supported:
    +
    +- org.apache.hadoop.io.compress.DefaultCodec
    +- org.apache.hadoop.io.compress.GzipCodec
    +- org.apache.hadoop.io.compress.BZip2Codec
    +
    +\<formatting-properties\> supported by the `HdfsTextSimple` profile include:
    +
    +| Keyword  | Value |
    +|-------|-------------------------------------|
    +| delimiter    | The delimiter character to use when writing the file. Default value
is a comma `,`.|
    +
    +
    +### <a id="profile_hdfstextsimple_writing"></a>Example: Writing Using the
HdfsTextSimple Profile
    +
    +This example uses the data schema introduced in [Example: Using the HdfsTextSimple Profile]
(HDFSFileDataPXF.html#profile_hdfstextsimple_query):
    +
    +
    +| Field Name  | Data Type |
    +|-------|-------------------------------------|
    +| location | text |
    +| month | text |
    +| number\_of\_orders | int |
    +| total\_sales | float8 |
    +
    +
    +Perform the following operations to use the PXF `HdfsTextSimple` profile to create a
HAWQ writable external table with the same data schema as defined above. You will also create
a separate external readable table to read the associated HDFS file.
    +
    +1. Create a writable HAWQ external table with the data schema described above. Write
to the HDFS file `/data/pxf_examples/pxfwritable_hdfs_textsimple1`. Create the table specifying
a comma `,` as the delimiter:
    +
    +    ``` sql
    +    gpadmin=# CREATE WRITABLE EXTERNAL TABLE pxf_hdfs_writabletbl_1(location text, month
text, num_orders int, total_sales float8)
    +                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxfwritable_hdfs_textsimple1?PROFILE=HdfsTextSimple')
    +              FORMAT 'TEXT' (delimiter=E',');
    +    ```
    +    
    +    The `FORMAT` subclause `delimiter` value is specified as the single ascii comma character
','. `E` escapes the character.
    +
    +2. Write a few records to the `pxfwritable_hdfs_textsimple1` HDFS file by invoking the
SQL `INSERT` command on `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Frankfurt', 'Mar', 777, 3956.98
);
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 VALUES ( 'Cleveland', 'Oct', 3812, 96645.37
);
    +    ```
    +
    +3. Insert the contents of the `pxf_hdfs_textsimple` table created in [Example: Using
the HdfsTextSimple Profile] (HDFSFileDataPXF.html#profile_hdfstextsimple_query) into `pxf_hdfs_writabletbl_1`:
    +
    +    ``` sql
    +    gpadmin=# INSERT INTO pxf_hdfs_writabletbl_1 SELECT * FROM pxf_hdfs_textsimple;
    +    ```
    +
    +4. View the file contents in HDFS:
    +
    +    ``` shell
    +    $ hdfs dfs -cat /data/pxf_examples/pxfwritable_hdfs_textsimple1/*
    +    Frankfurt,Mar,777,3956.98
    +    Cleveland,Oct,3812,96645.37
    +    Prague,Jan,101,4875.33
    +    Rome,Mar,87,1557.39
    +    Bangalore,May,317,8936.99
    +    Beijing,Jul,411,11600.67
    +    ```
    +
    +    Because you specified comma `,` as the delimiter, this character is the field separator
used in each record of the HDFS file.
    +
    +5. You may recall that querying an external writable table is not supported in HAWQ.
To read the newly-created writable table, create a readable external HAWQ table referencing
the writable table's HDFS file:
    --- End diff --
    
    Remove "You may recall that"  Also, change second section to something like:  "To query
data from the newly-created HDFS file, create a readable...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message