hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [30/50] incubator-hawq-docs git commit: use numbered steps throughout
Date Mon, 31 Oct 2016 22:13:40 GMT
use numbered steps throughout


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/f9c340af
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/f9c340af
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/f9c340af

Branch: refs/heads/tutorial-proto
Commit: f9c340af7fc117da341e63689262fa3e6b2da0da
Parents: cd6d034
Author: Lisa Owen <lowen@pivotal.io>
Authored: Tue Oct 25 14:02:27 2016 -0700
Committer: Lisa Owen <lowen@pivotal.io>
Committed: Tue Oct 25 14:02:27 2016 -0700

----------------------------------------------------------------------
 pxf/HDFSFileDataPXF.html.md.erb | 321 +++++++++++++++++++----------------
 1 file changed, 174 insertions(+), 147 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/f9c340af/pxf/HDFSFileDataPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/HDFSFileDataPXF.html.md.erb b/pxf/HDFSFileDataPXF.html.md.erb
index 06f8b4d..a853e3e 100644
--- a/pxf/HDFSFileDataPXF.html.md.erb
+++ b/pxf/HDFSFileDataPXF.html.md.erb
@@ -47,7 +47,7 @@ Perform the following steps to create data files used in subsequent exercises:
 1. Create an HDFS directory for PXF example data files:
 
     ``` shell
-     $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
+    $ sudo -u hdfs hdfs dfs -mkdir -p /data/pxf_examples
     ```
 
 2. Create a delimited plain text file:
@@ -148,36 +148,39 @@ Use the `HdfsTextSimple` profile when reading plain text delimited or
.csv files
 |-------|-------------------------------------|
 | delimiter    | The delimiter character in the file. Default value is a comma `,`.|
 
+### <a id="profile_hdfstextsimple_query"></a>Query With HdfsTextSimple Profile
 
-The following SQL call uses the PXF `HdfsTextSimple` profile to create a queryable HAWQ external
table from the `pxf_hdfs_simple.txt` file you created and added to HDFS in an earlier section:
+Perform the following steps to create and query external tables accessing the `pxf_hdfs_simple.txt`
file you created and added to HDFS in an earlier section.
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders
int, total_sales float8)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
-          FORMAT 'TEXT' (delimiter=E',');
-gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
-```
+1. Use the `HdfsTextSimple` profile to create a queryable HAWQ external table from the `pxf_hdfs_simple.txt`
file you created and added to HDFS in an earlier section:
 
-``` pre
-   location    | month | num_orders | total_sales 
----------------+-------+------------+-------------
- Prague        | Jan   |        101 |     4875.33
- Rome          | Mar   |         87 |     1557.39
- Bangalore     | May   |        317 |     8936.99
- Beijing       | Jul   |        411 |    11600.67
-(4 rows)
-```
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple(location text, month text, num_orders
int, total_sales float8)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'TEXT' (delimiter=E',');
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple;          
+    ```
 
-Create a second external table from `pxf_hdfs_simple.txt`, this time using the `CSV` `FORMAT`:
+    ``` pre
+       location    | month | num_orders | total_sales 
+    ---------------+-------+------------+-------------
+     Prague        | Jan   |        101 |     4875.33
+     Rome          | Mar   |         87 |     1557.39
+     Bangalore     | May   |        317 |     8936.99
+     Beijing       | Jul   |        411 |    11600.67
+    (4 rows)
+    ```
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, month text, num_orders
int, total_sales float8)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
-          FORMAT 'CSV';
-gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
-```
+2. Create a second external table from `pxf_hdfs_simple.txt`, this time using the `CSV` `FORMAT`:
 
-Notice: When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter` formatter
option is required, as comma is the default.
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textsimple_csv(location text, month text, num_orders
int, total_sales float8)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_simple.txt?PROFILE=HdfsTextSimple')
+              FORMAT 'CSV';
+    gpadmin=# SELECT * FROM pxf_hdfs_textsimple_csv;          
+    ```
+
+    Notice: When specifying `FORMAT 'CSV'` for a comma-separated value file, no `delimiter`
formatter option is required, as comma is the default.
 
 ## <a id="profile_hdfstextmulti"></a>HdfsTextMulti Profile
 
@@ -189,30 +192,39 @@ Use the `HdfsTextMulti` profile when reading plain text files with delimited
sin
 |-------|-------------------------------------|
 | delimiter    | The delimiter character in the file. |
 
-The following SQL call uses the PXF `HdfsTextMulti` profile to create a queryable HAWQ external
table from the `pxf_hdfs_multi.txt` file you created and added to HDFS in an earlier section:
+### <a id="profile_hdfstextmulti_query"></a>Query With HdfsTextMulti Profile
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, year int)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
-          FORMAT 'CSV' (delimiter=E':');
-gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
-```
+Perform the following operations to create and query an external HAWQ table accessing the
`pxf_hdfs_multi.txt` file you created and added to HDFS in an earlier section.
 
-``` pre
-         address          | month | year 
---------------------------+-------+------
- 4627 Star Rd.            | Sept  | 2017
- San Francisco, CA  94107           
- 113 Moon St.             | Jan   | 2018
- San Diego, CA  92093               
- 51 Belt Ct.              | Dec   | 2016
- Denver, CO  90123                  
- 93114 Radial Rd.         | Jul   | 2017
- Chicago, IL  60605                 
- 7301 Brookview Ave.      | Dec   | 2018
- Columbus, OH  43213                
-(5 rows)
-```
+1. Use the `HdfsTextMulti` profile to create a queryable external table from the `pxf_hdfs_multi.txt`
file:
+
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_textmulti(address text, month text, year int)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_multi.txt?PROFILE=HdfsTextMulti')
+              FORMAT 'CSV' (delimiter=E':');
+    ```
+    
+2. Query the `pxf_hdfs_textmulti` table:
+
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_textmulti;
+    ```
+
+    ``` pre
+             address          | month | year 
+    --------------------------+-------+------
+     4627 Star Rd.            | Sept  | 2017
+     San Francisco, CA  94107           
+     113 Moon St.             | Jan   | 2018
+     San Diego, CA  92093               
+     51 Belt Ct.              | Dec   | 2016
+     Denver, CO  90123                  
+     93114 Radial Rd.         | Jul   | 2017
+     Chicago, IL  60605                 
+     7301 Brookview Ave.      | Dec   | 2018
+     Columbus, OH  43213                
+    (5 rows)
+    ```
 
 ## <a id="profile_hdfsavro"></a>Avro Profile
 
@@ -252,7 +264,7 @@ The `Avro` profile supports the following \<custom-options\>:
 | RECORDKEY_DELIM | The delimiter character(s) to place between the field name and value
of a record entry when PXF maps an Avro complex data type to a text column. The default is
a colon `:` character. |
 
 
-### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas
+### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts"></a>Avro Schemas and Data
 
 Avro schemas are defined using JSON, and composed of the same primitive and complex types
identified in the data mapping section above. Avro schema files typically have a `.avsc` suffix.
 
@@ -267,133 +279,148 @@ The examples in this section will be operating on Avro data fields
with the foll
 - address - record comprised of street number (int), street name (string), and city (string)
 - relationship - enumerated type
 
-Create an Avro schema file to represent the above schema:
+#### <a id="topic_tr3_dpg_ts__section_m2p_ztg_ts_99"></a>Create Sample Schema
 
-``` shell
-$ vi /tmp/avro_schema.avsc
-```
+Perform the following operations to create an Avro schema to represent the example schema
described above.
 
-Copy and paste the following text into `avro_schema.avsc`:
-
-``` json
-{
-"type" : "record",
-  "name" : "example_schema",
-  "namespace" : "com.example",
-  "fields" : [ {
-    "name" : "id",
-    "type" : "long",
-    "doc" : "Id of the user account"
-  }, {
-    "name" : "username",
-    "type" : "string",
-    "doc" : "Name of the user account"
-  }, {
-    "name" : "followers",
-    "type" : {"type": "array", "items": "string"},
-    "doc" : "Users followers"
-  }, {
-    "name": "fmap",
-    "type": {"type": "map", "values": "long"}
-  }, {
-    "name": "relationship",
-    "type": {
-        "type": "enum",
-        "name": "relationshipEnum",
-        "symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
-    }
-  }, {
-    "name": "address",
-    "type": {
-        "type": "record",
-        "name": "addressRecord",
-        "fields": [
-            {"name":"number", "type":"int"},
-            {"name":"street", "type":"string"},
-            {"name":"city", "type":"string"}]
+1. Create a file named `avro_schema.avsc`:
+
+    ``` shell
+    $ vi /tmp/avro_schema.avsc
+    ```
+
+2. Copy and paste the following text into `avro_schema.avsc`:
+
+    ``` json
+    {
+    "type" : "record",
+      "name" : "example_schema",
+      "namespace" : "com.example",
+      "fields" : [ {
+        "name" : "id",
+        "type" : "long",
+        "doc" : "Id of the user account"
+      }, {
+        "name" : "username",
+        "type" : "string",
+        "doc" : "Name of the user account"
+      }, {
+        "name" : "followers",
+        "type" : {"type": "array", "items": "string"},
+        "doc" : "Users followers"
+      }, {
+        "name": "fmap",
+        "type": {"type": "map", "values": "long"}
+      }, {
+        "name": "relationship",
+        "type": {
+            "type": "enum",
+            "name": "relationshipEnum",
+            "symbols": ["MARRIED","LOVE","FRIEND","COLLEAGUE","STRANGER","ENEMY"]
+        }
+      }, {
+        "name": "address",
+        "type": {
+            "type": "record",
+            "name": "addressRecord",
+            "fields": [
+                {"name":"number", "type":"int"},
+                {"name":"street", "type":"string"},
+                {"name":"city", "type":"string"}]
+        }
+      } ],
+      "doc:" : "A basic schema for storing messages"
     }
-  } ],
-  "doc:" : "A basic schema for storing messages"
-}
-```
+    ```
 
-### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Sample Avro Data (JSON)
+#### <a id="topic_tr3_dpg_ts__section_spk_15g_ts"></a>Create Sample Avro Data
File (JSON)
 
-Create a text file named `pxf_hdfs_avro.txt`:
+Perform the following steps to create a sample Avro data file conforming to the above schema.
 
-``` shell
-$ vi /tmp/pxf_hdfs_avro.txt
-```
+1.  Create a text file named `pxf_hdfs_avro.txt`:
+
+    ``` shell
+    $ vi /tmp/pxf_hdfs_avro.txt
+    ```
 
-Enter the following data into `pxf_hdfs_avro.txt`:
+2. Enter the following data into `pxf_hdfs_avro.txt`:
 
-``` pre
-{"id":1, "username":"john","followers":["kate", "santosh"], "relationship": "FRIEND", "fmap":
{"kate":10,"santosh":4}, "address":{"number":1, "street":"renaissance drive", "city":"san
jose"}}
+    ``` pre
+    {"id":1, "username":"john","followers":["kate", "santosh"], "relationship": "FRIEND",
"fmap": {"kate":10,"santosh":4}, "address":{"number":1, "street":"renaissance drive", "city":"san
jose"}}
     
-{"id":2, "username":"jim","followers":["john", "pam"], "relationship": "COLLEAGUE", "fmap":
{"john":3,"pam":3}, "address":{"number":9, "street":"deer creek", "city":"palo alto"}}
-```
+    {"id":2, "username":"jim","followers":["john", "pam"], "relationship": "COLLEAGUE", "fmap":
{"john":3,"pam":3}, "address":{"number":9, "street":"deer creek", "city":"palo alto"}}
+    ```
 
-The sample data uses a comma `,` to separate top level records and a colon `:` to separate
map/key values and record field name/values.
+    The sample data uses a comma `,` to separate top level records and a colon `:` to separate
map/key values and record field name/values.
 
-Convert the text file to Avro format. There are various ways to perform the conversion programmatically
and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html),
and the jar file resides in the current directory:
+3. Convert the text file to Avro format. There are various ways to perform the conversion
programmatically and via the command line. In this example, we use the [Java Avro tools](http://avro.apache.org/releases.html),
and the jar file resides in the current directory:
 
-``` shell
-$ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt
> /tmp/pxf_hdfs_avro.avro
-```
+    ``` shell
+    $ java -jar ./avro-tools-1.8.1.jar fromjson --schema-file /tmp/avro_schema.avsc /tmp/pxf_hdfs_avro.txt
> /tmp/pxf_hdfs_avro.avro
+    ```
 
-The generated Avro binary data file is written to `/tmp/pxf_hdfs_avro.avro`. Copy this file
to HDFS:
+    The generated Avro binary data file is written to `/tmp/pxf_hdfs_avro.avro`. 
+    
+4. Copy the generated file to HDFS:
 
-``` shell
-$ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
-```
-### <a id="topic_avro_querydata"></a>Querying Avro Data
+    ``` shell
+    $ sudo -u hdfs hdfs dfs -put /tmp/pxf_hdfs_avro.avro /data/pxf_examples/
+    ```
+    
+### <a id="topic_avro_querydata"></a>Querying With Avro Profile
 
-Create a queryable external table from this Avro file:
+Perform the following steps to create and query an external table accessing the `pxf_hdfs_avro.avro`
file you added to HDFS in the previous section. When creating the table:
 
 -  Map the top-level primitive fields, `id` (type long) and `username` (type string), to
their equivalent HAWQ types (bigint and text). 
 -  Map the remaining complex fields to type text.
--  Explicitly set the record, map, and collection delimiters using the Avro profile custom
options:
+-  Explicitly set the record, map, and collection delimiters using the Avro profile custom
options.
 
-``` sql
-gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text, fmap
text, relationship text, address text)
-            LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
-          FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
-```
 
-A simple query of the external table shows the components of the complex type data separated
with delimiters:
+1. Use the `Avro` profile to create a queryable external table from the `pxf_hdfs_avro.avro`
file:
 
-``` sql
-gpadmin=# SELECT * FROM pxf_hdfs_avro;
-```
+    ``` sql
+    gpadmin=# CREATE EXTERNAL TABLE pxf_hdfs_avro(id bigint, username text, followers text,
fmap text, relationship text, address text)
+                LOCATION ('pxf://namenode:51200/data/pxf_examples/pxf_hdfs_avro.avro?PROFILE=Avro&COLLECTION_DELIM=,&MAPKEY_DELIM=:&RECORDKEY_DELIM=:')
+              FORMAT 'CUSTOM' (FORMATTER='pxfwritable_import');
+    ```
 
-``` pre
- id | username |   followers    |        fmap         | relationship |                  
   address                      
-----+----------+----------------+--------------------+--------------+---------------------------------------------------
-  1 | john     | [kate,santosh] | {kate:10,santosh:4} | FRIEND       | {number:1,street:renaissance
drive,city:san jose}
-  2 | jim      | [john,pam]     | {pam:3,john:3}      | COLLEAGUE    | {number:9,street:deer
creek,city:palo alto}
-(2 rows)
-```
+2. Perform a simple query of the `pxf_hdfs_avro` table:
 
-Process the delimited components in the text columns as necessary for your application. For
example, the following command uses the HAWQ internal `string_to_array` function to convert
entries in the `followers` field to a text array column in a new view.
+    ``` sql
+    gpadmin=# SELECT * FROM pxf_hdfs_avro;
+    ```
 
-``` sql
-gpadmin=# CREATE VIEW followers_view AS 
+    ``` pre
+     id | username |   followers    |        fmap         | relationship |              
       address                      
+    ----+----------+----------------+--------------------+--------------+---------------------------------------------------
+      1 | john     | [kate,santosh] | {kate:10,santosh:4} | FRIEND       | {number:1,street:renaissance
drive,city:san jose}
+      2 | jim      | [john,pam]     | {pam:3,john:3}      | COLLEAGUE    | {number:9,street:deer
creek,city:palo alto}
+    (2 rows)
+    ```
+
+    The simple query of the external table shows the components of the complex type data
separated with delimiters.
+
+
+3. Process the delimited components in the text columns as necessary for your application.
For example, the following command uses the HAWQ internal `string_to_array` function to convert
entries in the `followers` field to a text array column in a new view.
+
+    ``` sql
+    gpadmin=# CREATE VIEW followers_view AS 
   SELECT username, address, string_to_array(substring(followers FROM 2 FOR (char_length(followers)
- 2)), ',')::text[] 
-    AS followers 
-  FROM pxf_hdfs_avro;
-```
+        AS followers 
+      FROM pxf_hdfs_avro;
+    ```
 
-Query the view to filter rows based on whether a particular follower appears in the array:
+4. Query the view to filter rows based on whether a particular follower appears in the array:
 
-```
-gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john}';
-```
+    ``` sql
+    gpadmin=# SELECT username, address FROM followers_view WHERE followers @> '{john}';
+    ```
 
-``` pre
- username |                   address                   
-----------+---------------------------------------------
- jim      | {number:9,street:deer creek,city:palo alto}
-```
+    ``` pre
+     username |                   address                   
+    ----------+---------------------------------------------
+     jim      | {number:9,street:deer creek,city:palo alto}
+    ```
 
 ## <a id="accessdataonahavhdfscluster"></a>Accessing HDFS Data in a High Availability
HDFS Cluster
 


Mime
View raw message