hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #17: Updates for hawq register
Date Fri, 30 Sep 2016 18:55:45 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/17#discussion_r81391750
  
    --- Diff: reference/cli/admin_utilities/hawqregister.html.md.erb ---
    @@ -2,102 +2,83 @@
     title: hawq register
     ---
     
    -Loads and registers external parquet-formatted data in HDFS into a corresponding table
in HAWQ.
    +Loads and registers 
    +AO or Parquet-formatted data in HDFS into a corresponding table in HAWQ.
     
     ## <a id="topic1__section2"></a>Synopsis
     
     ``` pre
    -hawq register <databasename> <tablename> <hdfspath> 
    +Usage 1:
    +hawq register [<connection_options>] [-f <hdfsfilepath>] [-e <Eof>]
<tablename>
    +
    +Usage 2:
    +hawq register [<connection_options>] [-c <configfilepath>][--force] <tablename>
    +
    +Connection Options:
          [-h <hostname>] 
          [-p <port>] 
          [-U <username>] 
          [-d <database>]
    -     [-t <tablename>] 
    +     
    +Misc. Options:
          [-f <filepath>] 
    +	 [-e <eof>]
    + 	 [--force] 
          [-c <yml_config>]  
     hawq register help | -? 
     hawq register --version
     ```
     
     ## <a id="topic1__section3"></a>Prerequisites
     
    -The client machine where `hawq register` is executed must have the following:
    +The client machine where `hawq register` is executed must meet the following conditions:
     
     -   Network access to and from all hosts in your HAWQ cluster (master and segments) and
the hosts where the data to be loaded is located.
    +-   The Hadoop client must be configured and the hdfs filepath specified.
     -   The files to be registered and the HAWQ table located in the same HDFS cluster.
     -   The target table DDL is configured with the correct data type mapping.
     
     ## <a id="topic1__section4"></a>Description
     
    -`hawq register` is a utility that loads and registers existing or external parquet data
in HDFS into HAWQ, so that it can be directly ingested and accessed through HAWQ. Parquet
data from the file or directory in the specified path is loaded into the appropriate HAWQ
table directory in HDFS and the utility updates the corresponding HAWQ metadata for the files.

    +`hawq register` is a utility that loads and registers existing data files or folders
in HDFS into HAWQ internal tables, allowing HAWQ to directly read the data and use internal
table processing for operations such as transactions and high perforance, without needing
to load or copy it. Data from the file or directory specified by \<hdfsfilepath\> is
loaded into the appropriate HAWQ table directory in HDFS and the utility updates the corresponding
HAWQ metadata for the files. 
     
    -Only parquet tables can be loaded using the `hawq register` command. Metadata for the
parquet file(s) and the destination table must be consistent. Different  data types are used
by HAWQ tables and parquet tables, so the data is mapped. You must verify that the structure
of the parquet files and the HAWQ table are compatible before running `hawq register`. 
    +You can use `hawq register` to:
     
    -Note: only HAWQ or HIVE-generated parquet tables are currently supported.
    +-  Load and register external Parquet-formatted file data generated by an external system
such as Hive or Spark.
    +-  Recover cluster data from a backup cluster.
     
    -###Limitations for Registering Hive Tables to HAWQ
    -The currently-supported data types for generating Hive tables into HAWQ tables are: boolean,
int, smallint, tinyint, bigint, float, double, string, binary, char, and varchar.  
    +Two usage models are available.
     
    -The following HIVE data types cannot be converted to HAWQ equivalents: timestamp, decimal,
array, struct, map, and union.   
    +###Usage Model 1: register file data to an existing table.
    --- End diff --
    
    Capitalize "Register"


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message