hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #39: HAWQ-1071 - add examples for HiveText ...
Date Thu, 27 Oct 2016 16:11:15 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/39#discussion_r85371576
  
    --- Diff: pxf/HivePXF.html.md.erb ---
    @@ -151,184 +477,120 @@ To enable HCatalog query integration in HAWQ, perform the following
steps:
         postgres=# GRANT ALL ON PROTOCOL pxf TO "role";
         ``` 
         
    -3.  To query a Hive table with HCatalog integration, simply query HCatalog directly from
HAWQ. The query syntax is:
     
    -    ``` sql
    -    postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
    -    ```
    +To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The
query syntax is:
    +
    +``` sql
    +postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
    +```
     
    -    For example:
    +For example:
     
    -    ``` sql
    -    postgres=# SELECT * FROM hcatalog.default.sales;
    -    ```
    -
    -4.  To obtain a description of a Hive table with HCatalog integration, you can use the
`psql` client interface.
    -    -   Within HAWQ, use either the `\d                                         hcatalog.hive-db-name.hive-table-name`
or `\d+                                         hcatalog.hive-db-name.hive-table-name` commands
to describe a single table. For example, from the `psql` client interface:
    -
    -        ``` shell
    -        $ psql -d postgres
    -        postgres=# \d hcatalog.default.test
    -
    -        PXF Hive Table "default.test"
    -            Column    |  Type  
    -        --------------+--------
    -         name         | text
    -         type         | text
    -         supplier_key | int4
    -         full_price   | float8 
    -        ```
    -    -   Use `\d hcatalog.hive-db-name.*` to describe the whole database schema. For example:
    -
    -        ``` shell
    -        postgres=# \d hcatalog.default.*
    -
    -        PXF Hive Table "default.test"
    -            Column    |  Type  
    -        --------------+--------
    -         type         | text
    -         name         | text
    -         supplier_key | int4
    -         full_price   | float8
    -
    -        PXF Hive Table "default.testabc"
    -         Column | Type 
    -        --------+------
    -         type   | text
    -         name   | text
    -        ```
    -    -   Use `\d hcatalog.*.*` to describe the whole schema:
    -
    -        ``` shell
    -        postgres=# \d hcatalog.*.*
    -
    -        PXF Hive Table "default.test"
    -            Column    |  Type  
    -        --------------+--------
    -         type         | text
    -         name         | text
    -         supplier_key | int4
    -         full_price   | float8
    -
    -        PXF Hive Table "default.testabc"
    -         Column | Type 
    -        --------+------
    -         type   | text
    -         name   | text
    -
    -        PXF Hive Table "userdb.test"
    -          Column  | Type 
    -        ----------+------
    -         address  | text
    -         username | text
    -         
    -        ```
    -
    -    **Note:** When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog`
will not be listed as a database. If you use other `psql` compatible clients, `hcatalog` will
be listed as a database with a size value of `-1` since `hcatalog` is not a real database
in HAWQ.
    -
    -5.  Alternatively, you can use the **pxf\_get\_item\_fields** user-defined function (UDF)
to obtain Hive table descriptions from other client interfaces or third-party applications.
The UDF takes a PXF profile and a table pattern string as its input parameters.
    -
    -    **Note:** Currently the only supported input profile is `'Hive'`.
    -
    -    For example, the following statement returns a description of a specific table. The
description includes path, itemname (table), fieldname, and fieldtype.
    +``` sql
    +postgres=# SELECT * FROM hcatalog.default.sales_info;
    +```
    +
    +To obtain a description of a Hive table with HCatalog integration, you can use the `psql`
client interface.
    +
    +-   Within HAWQ, use either the `\d                                         hcatalog.hive-db-name.hive-table-name`
or `\d+                                         hcatalog.hive-db-name.hive-table-name` commands
to describe a single table. For example, from the `psql` client interface:
    +
    +    ``` shell
    +    $ psql -d postgres
    +    ```
     
         ``` sql
    -    postgres=# select * from pxf_get_item_fields('Hive','default.test');
    +    postgres=# \d hcatalog.default.sales_info_rcfile;
         ```
    -    
    -    ``` pre
    -      path   | itemname |  fieldname   | fieldtype 
    -    ---------+----------+--------------+-----------
    -     default | test     | name         | text
    -     default | test     | type         | text
    -     default | test     | supplier_key | int4
    -     default | test     | full_price   | float8
    -    (4 rows)
    +
    +    ``` shell
    +    PXF Hive Table "default.sales_info_rcfile"
    +          Column      |  Type
    +    ------------------+--------
    +     location         | text
    +     month            | text
    +     number_of_orders | int4
    +     total_sales      | float8
         ```
    +-   Use `\d hcatalog.hive-db-name.*` to describe the whole database schema, i.e. all
tables in `hive-db-name`.
    +-   Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases and tables.
     
    -    The following statement returns table descriptions from the default database.
    +When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed
as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a
database with a size value of `-1` since `hcatalog` is not a real database in HAWQ.
    +
    +Alternatively, you can use the `pxf_get_item_fields` user-defined function (UDF) to obtain
Hive table descriptions from other client interfaces or third-party applications. The UDF
takes a PXF profile and a table pattern string as its input parameters.  **Note:** The only
supported input profile at this time is `'Hive'`.
    +
    +- The following statement returns a description of a specific table. The description
includes path, itemname (table), fieldname, and fieldtype.
     
         ``` sql
    -    postgres=# select * from pxf_get_item_fields('Hive','default.*');
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.sales_info_rcfile');
         ```
    -    
    +
         ``` pre
    -      path   | itemname |  fieldname   | fieldtype 
    -    ---------+----------+--------------+-----------
    -     default | test     | name         | text
    -     default | test     | type         | text
    -     default | test     | supplier_key | int4
    -     default | test     | full_price   | float8
    -     default | testabc  | name         | text
    -     default | testabc  | type         | text
    -    (6 rows)
    +      path   |     itemname      |    fieldname     | fieldtype
    +    ---------+-------------------+------------------+-----------
    +     default | sales_info_rcfile | location         | text
    +     default | sales_info_rcfile | month            | text
    +     default | sales_info_rcfile | number_of_orders | int4
    +     default | sales_info_rcfile | total_sales      | float8
         ```
     
    -    The following statement returns a description of the entire schema.
    +- The following statement returns table descriptions from the default database.
     
         ``` sql
    -    postgres=# select * from pxf_get_item_fields('Hive', '*.*');
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*');
         ```
    -    
    -    ``` pre
    -      path   | itemname |  fieldname   | fieldtype 
    -    ---------+----------+--------------+-----------
    -     default | test     | name         | text
    -     default | test     | type         | text
    -     default | test     | supplier_key | int4
    -     default | test     | full_price   | float8
    -     default | testabc  | name         | text
    -     default | testabc  | type         | text
    -     userdb  | test     | username     | text
    -     userdb  | test     | address      | text
    -    (8 rows)
    +
    +- The following statement returns a description of the entire schema.
    +
    +    ``` sql
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*');
         ```
     
     ### <a id="topic_r5k_pst_25"></a>Limitations
     
     HCatalog integration has the following limitations:
     
    --   HCatalog integration queries and describe commands do not support complex types;
only primitive types are supported. Use PXF external tables to query complex types in Hive
instead. (See [Hive Complex Types](#topic_b4v_g3n_25) for example.)
    --   Even for primitive types, HCatalog metadata descriptions produced by `\d` and` \d+`
are converted to HAWQ types. For example, the Hive type `tinyint` is converted to HAWQ type
`int2`. (See [Data Types](../reference/HAWQDataTypes.html) for a list of data types in HAWQ.)
    --   HAWQ reserves the database name "hcatalog" for system use. You cannot connect to
or alter the system "hcatalog" database.
    +-   HCatalog integration queries and describe commands do not support complex types;
only primitive types are supported. Use PXF external tables to query complex types in Hive.
(See [Complex Types Example](#complex_dt_example).)
    +-   Even for primitive types, HCatalog metadata descriptions produced by `\d` and` \d+`
are converted to HAWQ types. For example, the Hive type `tinyint` is converted to HAWQ type
`int2`. (See [Data Type Mapping](#hive_primdatatypes).)
    +-   HAWQ reserves the database name `hcatalog` for system use. You cannot connect to
or alter the system `hcatalog` database.
     
     ## <a id="partitionfiltering"></a>Partition Filtering
     
    -The PXF Hive plug-in uses the Hive partitioning feature and directory structure. This
enables partition exclusion on HDFS files that contain the Hive table. To use the partition
filtering feature to reduce network traffic and I/O, run a PXF query using a WHERE clause that
refers to a specific partition in the partitioned Hive table.
    +The PXF Hive plug-in supports the Hive partitioning feature and directory structure.
This enables partition exclusion on selected HDFS files comprising the Hive table. To use the
partition filtering feature to reduce network traffic and I/O, run a PXF query using a `WHERE`
clause that refers to a specific partition in the partitioned Hive table.
     
    -To take advantage of PXF partition filtering push-down, name the partition fields in
the external table. These names must be the same as the names stored in the Hive table. Otherwise,
PXF ignores Partition filtering and the filtering is performed on the HAWQ side, impacting performance.
    +To take advantage of PXF partition filtering push-down, the Hive and PXF partition field
names should be the same. Otherwise, PXF ignores partition filtering and the filtering is
performed on the HAWQ side, impacting performance.
     
    -**Note:** The Hive plug-in only filters on partition columns, not on other table attributes.
    +**Note:** The Hive plug-in filters only on partition columns, not on other table attributes.
     
    -### <a id="example2"></a>Example
    +### <a id="example2"></a>Create Partitioned Hive Table
     
     Create a Hive table `sales_part` with two partition columns, `delivery_state` and `delivery_city:`
     
     ``` sql
    -hive> CREATE TABLE sales_part (name STRING, type STRING, supplier_key INT, price DOUBLE)
    -PARTITIONED BY (delivery_state STRING, delivery_city STRING)
    -ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
    +hive> CREATE TABLE sales_part (name string, type string, supplier_key int, price double)
    +        PARTITIONED BY (delivery_state string, delivery_city string)
    +        ROW FORMAT DELIMITED FIELDS TERMINATED BY ',';
     ```
     
     Load data into this Hive table and add some partitions:
     
     ``` sql
    -hive> LOAD DATA LOCAL INPATH '/local/path/data1.txt' INTO TABLE sales_part PARTITION(delivery_state
= 'CALIFORNIA', delivery_city = 'San Francisco');
    -hive> LOAD DATA LOCAL INPATH '/local/path/data2.txt' INTO TABLE sales_part PARTITION(delivery_state
= 'CALIFORNIA', delivery_city = 'Sacramento');
    -hive> LOAD DATA LOCAL INPATH '/local/path/data3.txt' INTO TABLE sales_part PARTITION(delivery_state
= 'NEVADA'    , delivery_city = 'Reno');
    -hive> LOAD DATA LOCAL INPATH '/local/path/data4.txt' INTO TABLE sales_part PARTITION(delivery_state
= 'NEVADA'    , delivery_city = 'Las Vegas');
    +hive> INSERT INTO TABLE sales_part PARTITION(delivery_state = 'CALIFORNIA', delivery_city
= 'Fresno') VALUES ('block', 'widget', 33, 15.17);
    --- End diff --
    
    Might be nice to add line breaks and spacing as with the previous code block, so readers
don't have to scroll horizontally.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message