hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [04/14] incubator-hawq-docs git commit: remove SerialWritable, use namenode for host
Date Wed, 26 Oct 2016 18:31:04 GMT
remove SerialWritable, use namenode for host


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/fd029d56
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/fd029d56
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/fd029d56

Branch: refs/heads/develop
Commit: fd029d568589f5a4e2461d92437963d97f7d3198
Parents: 5a941a7
Author: Lisa Owen <lowen@pivotal.io>
Authored: Thu Oct 20 12:20:21 2016 -0700
Committer: Lisa Owen <lowen@pivotal.io>
Committed: Thu Oct 20 12:20:21 2016 -0700

----------------------------------------------------------------------
 pxf/HDFSFileDataPXF.html.md.erb | 62 ++++--------------------------------
 1 file changed, 7 insertions(+), 55 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/fd029d56/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index 2f87037..9914ca9 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -2,7 +2,7 @@
 title: Accessing HDFS File Data
 ---
 
-HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The
PXF HDFS plug-in reads file data stored in HDFS.  The plug-in supports plain delimited and
comma-separated-value text files.  The HDFS plug-in also supports Avro and SequenceFile binary
formats.
+HDFS is the primary distributed storage mechanism used by Apache Hadoop applications. The
PXF HDFS plug-in reads file data stored in HDFS.  The plug-in supports plain delimited and
comma-separated-value text files.  The HDFS plug-in also supports the Avro binary format.
 
 This section describes how to use PXF to access HDFS data, including how to create and query
an external table from files in the HDFS data store.
 
@@ -15,10 +15,9 @@ Before working with HDFS file data using HAWQ and PXF, ensure that:
 
 ## <a id="hdfsplugin_fileformats"></a>HDFS File Formats
 
-The PXF HDFS plug-in supports the following file formats:
+The PXF HDFS plug-in supports reading the following file formats:
 
 - TextFile - comma-separated value (.csv) or delimited format plain text file
-- SequenceFile - flat file consisting of binary key/value pairs
 - Avro - JSON-defined, schema-based data serialization format
 
 The PXF HDFS plug-in includes the following profiles to support the file formats listed above:
@@ -26,7 +25,6 @@ The PXF HDFS plug-in includes the following profiles to support the file
formats
 - `HdfsTextSimple` - text files
 - `HdfsTextMulti` - text files with embedded line feeds
 - `Avro` - Avro files
-- `SequenceWritable` - SequenceFile  (write only?)
 
 
 ## <a id="hdfsplugin_cmdline"></a>HDFS Shell Commands
@@ -109,7 +107,7 @@ $ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_tm.txt /data/pxf_examples/
 You will use these HDFS files in later sections.
 
 ## <a id="hdfsplugin_queryextdata"></a>Querying External HDFS Data
-The PXF HDFS plug-in supports several profiles. These include `HdfsTextSimple`, `HdfsTextMulti`,
`Avro`, and `SequenceWritable`.
+The PXF HDFS plug-in supports several profiles. These include `HdfsTextSimple`, `HdfsTextMulti`,
and `Avro`.
 
 Use the following syntax to create a HAWQ external table representing HDFS data: 
 
@@ -117,7 +115,7 @@ Use the following syntax to create a HAWQ external table representing
HDFS data:
 CREATE EXTERNAL TABLE <table_name> 
     ( <column_name> <data_type> [, ...] | LIKE <other_table> )
 LOCATION ('pxf://<host>[:<port>]/<path-to-hdfs-file>
-    ?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro|SequenceWritable[&<custom-option>=<value>[...]]')
+    ?PROFILE=HdfsTextSimple|HdfsTextMulti|Avro[&<custom-option>=<value>[...]]')
 FORMAT '[TEXT|CSV|CUSTOM]' (<formatting-properties>);
 ```
 
@@ -127,12 +125,11 @@ HDFS-plug-in-specific keywords and values used in the [CREATE EXTERNAL
TABLE](..
 |-------|-------------------------------------|
 | \<host\>[:\<port\>]    | The HDFS NameNode and port. |
 | \<path-to-hdfs-file\>    | The path to the file in the HDFS data store. |
-| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`,
`SequenceWritable`, or `Avro`. |
+| PROFILE    | The `PROFILE` keyword must specify one of the values `HdfsTextSimple`, `HdfsTextMulti`,
or `Avro`. |
 | \<custom-option\>  | \<custom-option\> is profile-specific. Profile-specific
options are discussed in the relevant profile topic later in this section.|
 | FORMAT 'TEXT' | Use '`TEXT`' `FORMAT` with the `HdfsTextSimple` profile when \<path-to-hdfs-file\>
references a plain text delimited file.  |
 | FORMAT 'CSV' | Use '`CSV`' `FORMAT` with `HdfsTextSimple` and `HdfsTextMulti` profiles
when \<path-to-hdfs-file\> references a comma-separated value file.  |
 | FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with  the `Avro` profiles. The `Avro` '`CUSTOM`'
`FORMAT` supports only the built-in `(formatter='pxfwritable_import')` \<formatting-property\>
|
-| FORMAT 'CUSTOM' | Use the`CUSTOM` `FORMAT` with the `SequenceWritable` profile. The `SequenceWritable`
'`CUSTOM`' `FORMAT` supports only the built-in `(formatter='pxfwritable_export')` \<formatting-property\>
|
  \<formatting-properties\>    | \<formatting-properties\> are profile-specific.
Profile-specific formatting options are discussed in the relevant profile topic later in this
section. |
 
 *Note*: When creating PXF external tables, you cannot use the `HEADER` option in your `FORMAT`
specification.
@@ -192,7 +189,7 @@ The following SQL call uses the PXF `HdfsTextMulti` profile to create
a queryabl
 
 ``` sql
 gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, year int)
-            LOCATION ('pxf://sandbox.hortonworks.com:51200/data/pxf_examples/pxf_hdfs_tm.txt?PROFILE=HdfsTextMulti')

+            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_tm.txt?PROFILE=HdfsTextMulti')

           FORMAT 'CSV' (delimiter=E':');
 gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
 ```
@@ -358,7 +355,7 @@ Create a queryable external table from this Avro file:
 
 ``` sql
 gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap
text, relationship text, address text)
-            LOCATION ('pxf://sandbox.hortonworks.com:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
+            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
           FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
 ```
 
@@ -393,51 +390,6 @@ gpadmin=# SELECT username, address FROM followers_view WHERE followers
@> '{john
  jim      | {number:9,street:deer creek,city:palo alto}
 ```
 
-## <a id="profile_hdfsseqwritable"></a>SequenceWritable Profile 
-
-Use the `SequenceWritable` profile when writing SequenceFile format files. Files of this
type consist of binary key/value pairs. Sequence files are a common data transfer format between
MapReduce jobs. 
-
-The `SequenceWritable` profile supports the following \<custom-options\>:
-
-| Keyword  | Value Description |
-|-------|-------------------------------------|
-| COMPRESSION_CODEC    | The compression codec Java class name. If this option is not provided,
no data compression is performed. |
-| COMPRESSION_TYPE    | The compression type of the sequence file; supported values are `RECORD`
(the default) or `BLOCK`. |
-| DATA-SCHEMA    | The name of the writer serialization class. The jar file in which this
class resides must be in the PXF class path. This option has no default value. |
-| THREAD-SAFE | Boolean value determining if a table query can run in multi-thread mode.
Default value is `TRUE` - requests can run in multi-thread mode. When set to `FALSE`, requests
will be handled in a single thread. |
-
-???? MORE HERE
-
-??? ADDRESS SERIALIZATION
-
-
-## <a id="recordkeyinkey-valuefileformats"></a>Reading the Record Key 
-
-Sequence file and other file formats that store rows in a key-value format can access the
key value through HAWQ by using the `recordkey` keyword as a field name.
-
-The field type of `recordkey` must correspond to the key type, much as the other fields must
match the HDFS data. 
-
-`recordkey` can be any of the following Hadoop types:
-
--   BooleanWritable
--   ByteWritable
--   DoubleWritable
--   FloatWritable
--   IntWritable
--   LongWritable
--   Text
-
-### <a id="example1"></a>Example
-
-A data schema `Babies.class` contains three fields: name (text), birthday (text), weight
(float). An external table definition for this schema must include these three fields, and
can either include or ignore the `recordkey`.
-
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE babies_1940 (recordkey int, name text, birthday text, weight
float)
-            LOCATION ('pxf://namenode:51200/babies_1940s?PROFILE=SequenceWritable&DATA-SCHEMA=Babies')
-          FORMAT 'CUSTOM' (formatter='pxfwritable_import');
-gpadmin=# SELECT * FROM babies_1940;
-```
-
 ## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High Availability
HDFS Cluster
 
 To access external HDFS data in a High Availability HDFS cluster, change the URI LOCATION
clause to use \<HA-nameservice\> rather than  \<host\>[:\<port\>].


Mime
View raw message