Return-Path: X-Original-To: archive-asf-public-internal@cust-asf2.ponee.io Delivered-To: archive-asf-public-internal@cust-asf2.ponee.io Received: from cust-asf.ponee.io (cust-asf.ponee.io [163.172.22.183]) by cust-asf2.ponee.io (Postfix) with ESMTP id 194E0200BC0 for ; Mon, 31 Oct 2016 23:28:27 +0100 (CET) Received: by cust-asf.ponee.io (Postfix) id 17F3C160B06; Mon, 31 Oct 2016 22:28:27 +0000 (UTC) Delivered-To: archive-asf-public@cust-asf.ponee.io Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by cust-asf.ponee.io (Postfix) with SMTP id 40507160B05 for ; Mon, 31 Oct 2016 23:28:26 +0100 (CET) Received: (qmail 95314 invoked by uid 500); 31 Oct 2016 22:28:25 -0000 Mailing-List: contact dev-help@hawq.incubator.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@hawq.incubator.apache.org Delivered-To: mailing list dev@hawq.incubator.apache.org Received: (qmail 95115 invoked by uid 99); 31 Oct 2016 22:28:25 -0000 Received: from pnap-us-west-generic-nat.apache.org (HELO spamd4-us-west.apache.org) (209.188.14.142) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Oct 2016 22:28:25 +0000 Received: from localhost (localhost [127.0.0.1]) by spamd4-us-west.apache.org (ASF Mail Server at spamd4-us-west.apache.org) with ESMTP id B5683C0D53 for ; Mon, 31 Oct 2016 22:28:24 +0000 (UTC) X-Virus-Scanned: Debian amavisd-new at spamd4-us-west.apache.org X-Spam-Flag: NO X-Spam-Score: -7.019 X-Spam-Level: X-Spam-Status: No, score=-7.019 tagged_above=-999 required=6.31 tests=[KAM_LAZY_DOMAIN_SECURITY=1, RCVD_IN_DNSWL_HI=-5, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, RP_MATCHES_RCVD=-2.999] autolearn=disabled Received: from mx1-lw-eu.apache.org ([10.40.0.8]) by localhost (spamd4-us-west.apache.org [10.40.0.11]) (amavisd-new, port 10024) with ESMTP id kVV9qfuqAH_8 for ; Mon, 31 Oct 2016 22:28:22 +0000 (UTC) Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by mx1-lw-eu.apache.org (ASF Mail Server at mx1-lw-eu.apache.org) with SMTP id 91A8C5FAC5 for ; Mon, 31 Oct 2016 22:28:21 +0000 (UTC) Received: (qmail 86608 invoked by uid 99); 31 Oct 2016 22:25:51 -0000 Received: from git1-us-west.apache.org (HELO git1-us-west.apache.org) (140.211.11.23) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 31 Oct 2016 22:25:51 +0000 Received: by git1-us-west.apache.org (ASF Mail Server at git1-us-west.apache.org, from userid 33) id F4132F0DC3; Mon, 31 Oct 2016 22:25:50 +0000 (UTC) From: dyozie To: dev@hawq.incubator.apache.org Reply-To: dev@hawq.incubator.apache.org References: In-Reply-To: Subject: [GitHub] incubator-hawq-docs pull request #46: HAWQ-1119 - create doc content for PXF... Content-Type: text/plain Message-Id: <20161031222550.F4132F0DC3@git1-us-west.apache.org> Date: Mon, 31 Oct 2016 22:25:50 +0000 (UTC) archived-at: Mon, 31 Oct 2016 22:28:27 -0000 Github user dyozie commented on a diff in the pull request: https://github.com/apache/incubator-hawq-docs/pull/46#discussion_r85813059 --- Diff: pxf/HDFSWritablePXF.html.md.erb --- @@ -0,0 +1,410 @@ +--- +title: Writing Data to HDFS +--- + +The PXF HDFS plug-in supports writable external tables using the `HdfsTextSimple` and `SequenceWritable` profiles. You might create a writable table to export data from a HAWQ internal table to HDFS. + +This section describes how to use these PXF profiles to create writable external tables. + +**Note**: You cannot directly query data in a HAWQ writable table. After creating the external writable table, you must create a HAWQ readable external table accessing the HDFS file, then query that table. ??You can also create a Hive table to access the HDFS file.?? + +## Prerequisites + +Before working with HDFS file data using HAWQ and PXF, ensure that: + +- The HDFS plug-in is installed on all cluster nodes. See [Installing PXF Plug-ins](InstallPXFPlugins.html) for PXF plug-in installation information. +- All HDFS users have read permissions to HDFS services and that write permissions have been restricted to specific users. + +## Writing to PXF External Tables +The PXF HDFS plug-in supports writable two profiles: `HdfsTextSimple` and `SequenceWritable`. + +Use the following syntax to create a HAWQ external writable table representing HDFS data:  + +``` sql +CREATE EXTERNAL WRITABLE TABLE + ( [, ...] | LIKE ) +LOCATION ('pxf://[:]/ + ?PROFILE=HdfsTextSimple|SequenceWritable[&=[...]]') +FORMAT '[TEXT|CSV|CUSTOM]' (); +``` + +HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL TABLE](../reference/sql/CREATE-EXTERNAL-TABLE.html) call are described in the table below. + +| Keyword | Value | +|-------|-------------------------------------| +| \[:\] | The HDFS NameNode and port. | +| \ | The path to the file in the HDFS data store. | +| PROFILE | The `PROFILE` keyword must specify one of the values `HdfsTextSimple` or `SequenceWritable`. | +| \ | \ is profile-specific. These options are discussed in the next topic.| +| FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \ will reference a plain text delimited file. The `HdfsTextSimple` '`TEXT`' `FORMAT` supports only the built-in `(delimiter=)` \. | +| FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` when \ will reference a comma-separated value file. | +| FORMAT 'CUSTOM' | Use the `'CUSTOM'` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable` '`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export)` (write) and `(formatter='pxfwritable_import)` (read) \. + +**Note**: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT` specification. + +## Custom Options + +The `HdfsTextSimple` and `SequenceWritable` profiles support the following \: + +| Keyword | Value Description | +|-------|-------------------------------------| +| COMPRESSION_CODEC | The compression codec Java class name. If this option is not provided, no data compression is performed. Supported compression codecs include: `org.apache.hadoop.io.compress.DefaultCodec`, `org.apache.hadoop.io.compress.BZip2Codec`, and `org.apache.hadoop.io.compress.GzipCodec` (`HdfsTextSimple` profile only) | +| COMPRESSION_TYPE | The compression type to employ; supported values are `RECORD` (the default) or `BLOCK`. | +| DATA-SCHEMA | (`SequenceWritable` profile only) The name of the writer serialization/deserialization class. The jar file in which this class resides must be in the PXF class path. This option has no default value. | +| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode. Default value is `TRUE`, requests run in multi-threaded mode. When set to `FALSE`, requests will be handled in a single thread. `THREAD-SAFE` should be set appropriately when operations that are not thread-safe are performed (i.e. compression). | + +## HdfsTextSimple Profile + +Use the `HdfsTextSimple` profile when writing delimited data to a plain text file where each row is a single record. + +Writable tables created using the `HdfsTextSimple` profile can use no, record, or block compression. When compression is used, the default, gzip, and bzip2 Hadoop compression codecs are supported: --- End diff -- Small edit: Writable tables created using the HdfsTextSimple profile can optionally use `record` or `block` compression. The following compression codecs are supported when compression is enabled: --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastructure@apache.org or file a JIRA ticket with INFRA. ---