hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #94: HAWQ-1304 - multiple doc changes for P...
Date Wed, 08 Feb 2017 22:20:14 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/94#discussion_r100185543
  
    --- Diff: markdown/pxf/HivePXF.html.md.erb ---
    @@ -164,9 +164,143 @@ Create a Hive table to expose our sample data set.
     
     In examples later in this section, you will access the `sales_info` Hive table directly
via PXF. You will also insert `sales_info` data into tables of other Hive file format types,
and use PXF to access those directly as well.
     
    +
    +## <a id="hcatalog"></a>Using PXF and HCatalog to Query Hive
    +
    +You can query Hive tables directly through HCatalog integration with HAWQ and PXF, regardless
of the underlying file storage format. This integration allows HAWQ to directly use table
metadata stored in HCatalog.
    +
    +HCatalog is built on top of the Hive metastore and incorporates Hive's DDL. This provides
several advantages:
    +
    +-   You do not need to know the table schema of your Hive tables
    +-   You do not need to manually enter information about Hive table location or format
    +-   If Hive table metadata changes, HCatalog provides updated metadata. This is in contrast
to the use of static external PXF tables to define Hive table metadata for HAWQ.
    +
    +The following diagram depicts how HAWQ integrates with HCatalog to query Hive tables:
    +
    +<img src="../images/hawq_hcatalog.png" id="hcatalog__image_ukw_h2v_c5" class="image"
width="672" />
    +
    +1.  HAWQ retrieves table metadata from HCatalog using PXF.
    +2.  HAWQ creates in-memory catalog tables from the retrieved metadata. If a table is
referenced multiple times in a transaction, HAWQ uses its in-memory metadata to reduce external
calls to HCatalog.
    +3.  PXF queries Hive using table metadata that is stored in the HAWQ in-memory catalog
tables. Table metadata is dropped at the end of the transaction.
    +
    +
    +### <a id="topic_j1l_enabling"></a>Enabling HCatalog Integration
    +
    +To enable HCatalog query integration in HAWQ, perform the following steps:
    +
    +1.  Make sure your deployment meets the requirements listed in [Prerequisites](#installingthepxfhiveplugin).
    +2.  If necessary, set the `pxf_service_address` global configuration property to the
hostname or IP address and port where you have installed the PXF Hive plug-in. By default,
the value is set to `localhost:51200`.
    +
    +    ``` sql
    +    postgres=# SET pxf_service_address TO <hivenode>:51200
    +    ```
    +
    +3.  HCatalog internally uses the `pxf` protocol to query.  Grant this protocol privilege
to all roles requiring access:
    +
    +    ``` sql
    +    postgres=# GRANT ALL ON PROTOCOL pxf TO <role>;
    +    ```
    +
    +4. It is not recommended to create a HAWQ table using the `WITH (OIDS)` clause. If any
user tables were created using the `WITH (OIDS)` clause, additional operations are required
to enable HCatalog integration. To access a Hive table via HCatalog when user tables were
created using `WITH (OIDS)`, HAWQ users must have `SELECT` permission to query every user
table within the same schema that was created using the `WITH (OIDS)` clause.
    +
    +    1. Determine which user tables were created using the `WITH (OIDS)` clause:
    +
    +        ``` sql
    +        postgres=# SELECT oid, relname FROM pg_class
    +                     WHERE relhasoids = true
    +                       AND relnamespace <> (SELECT oid FROM pg_namespace WHERE
nspname = 'pg_catalog');
    +        ```
    +
    +    2. Grant `SELECT` privilege on all returned tables to all roles to which you chose
to provide HCatalog query access. For example:
    +
    +        ``` sql
    +        postgres=# GRANT SELECT ON <table-created-WITH-OIDS> TO <role>
    +        ```
    +
    +### <a id="topic_j1l_y55_c5"></a>Usage
    +
    +To query a Hive table with HCatalog integration, query HCatalog directly from HAWQ. The
query syntax is:
    +
    +``` sql
    +postgres=# SELECT * FROM hcatalog.hive-db-name.hive-table-name;
    +```
    +
    +For example:
    +
    +``` sql
    +postgres=# SELECT * FROM hcatalog.default.sales_info;
    +```
    +
    +To obtain a description of a Hive table with HCatalog integration, you can use the `psql`
client interface.
    +
    +-   Within HAWQ, use either the `\d                                         hcatalog.hive-db-name.hive-table-name`
or `\d+                                         hcatalog.hive-db-name.hive-table-name` commands
to describe a single table.  `\d` displays only HAWQ's interpretation of the underlying source
(Hive in this case) data type, while `\d+` displays both the HAWQ interpreted and Hive source
data types. For example, from the `psql` client interface:
    +
    +    ``` shell
    +    $ psql -d postgres
    +    ```
    +
    +    ``` sql
    +    postgres=# \d+ hcatalog.default.sales_info_rcfile;
    +    ```
    +
    +    ``` shell
    +       PXF Hive Table "default.sales_info"
    +          Column      |  Type  | Source type 
    +    ------------------+--------+-------------
    +     location         | text   | string
    +     month            | text   | string
    +     number_of_orders | int4   | int
    +     total_sales      | float8 | double
    +    ```
    +-   Use `\d hcatalog.hive-db-name.*` to describe the whole database schema, i.e. all
tables in `hive-db-name`.
    +-   Use `\d hcatalog.*.*` to describe the whole schema, i.e. all databases and tables.
    +
    +When using `\d` or `\d+` commands in the `psql` HAWQ client, `hcatalog` will not be listed
as a database. If you use other `psql` compatible clients, `hcatalog` will be listed as a
database with a size value of `-1` since `hcatalog` is not a real database in HAWQ.
    +
    +Alternatively, you can use the `pxf_get_item_fields` user-defined function (UDF) to obtain
Hive table descriptions from other client interfaces or third-party applications. The UDF
takes a PXF profile and a table pattern string as its input parameters.  **Note:** The only
supported input profile at this time is `'Hive'`.
    +
    +- The following statement returns a description of a specific table. The description
includes path, itemname (table), fieldname, fieldtype (HAWQ type), and sourcefieldtype (Hive
type).
    +
    +    ``` sql
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.sales_info');
    +    ```
    +
    +    ``` pre
    +      path   |  itemname  |    fieldname     | fieldtype | sourcefieldtype 
    +    ---------+------------+------------------+-----------+-----------------
    +     default | sales_info | location         | text      | string
    +     default | sales_info | month            | text      | string
    +     default | sales_info | number_of_orders | int4      | int
    +     default | sales_info | total_sales      | float8    | double
    +    ```
    +
    +- The following statement returns table descriptions from the default database.
    +
    +    ``` sql
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive','default.*');
    +    ```
    +
    +- The following statement returns a description of the entire schema.
    +
    +    ``` sql
    +    postgres=# SELECT * FROM pxf_get_item_fields('Hive', '*.*');
    +    ```
    +
    +### <a id="topic_r5k_pst_25"></a>Limitations
    +
    +HCatalog integration has the following limitations:
    +
    +-   HCatalog queries on Hive tables with complex type fields return those fields serialized
as text.
    +-   Even for primitive types, HCatalog metadata descriptions produced by `\d` are HAWQ's
interpretation of the underlying Hive data types. For example, the Hive type `tinyint` is
converted to HAWQ type `int2`. (See [Data Type Mapping](#hive_primdatatypes).)
    +-   HAWQ reserves the database name `hcatalog` for system use. You cannot connect to
or alter the system `hcatalog` database.
    +
    +
    +
     ## <a id="topic_p2s_lvl_28"></a>Querying External Hive Data
     
    -The PXF Hive plug-in supports several Hive-related profiles. These include `Hive`, `HiveText`,
and `HiveRC`.
    +In the previous section, you used HCatalog integration to query a Hive table. You can
also create a PXF/HAWQ external table to access Hive table data. This Hive table access mechanism
requires that you identify an appropriate Hive profile.
    +
    +The PXF Hive plug-in supports several Hive-related profiles. These include `Hive`, `HiveText`,
and `HiveRC`. `HiveText` and `HiveRC` profiles are specifically optimized for text and RC
file formats, respectively. The `Hive` profile is optimized for all file storage types; use
the `Hive` profile when the underlying Hive table is composed of multiple partitions with
differing file formats.
    --- End diff --
    
    Third sentence here should start with "The"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message