hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #17: Updates for hawq register
Date Fri, 30 Sep 2016 18:55:45 GMT
Github user dyozie commented on a diff in the pull request:

    --- Diff: datamgmt/load/g-register_files.html.md.erb ---
    @@ -0,0 +1,213 @@
    +title: Registering Files into HAWQ Internal Tables
    +The `hawq register` utility loads and registers HDFS data files or folders into HAWQ
internal tables. Files can be read directly, rather than having to be copied or loaded, resulting
in higher performance and more efficient transaction processing.
    +Data from the file or directory specified by \<hdfsfilepath\> is loaded into the
appropriate HAWQ table directory in HDFS and the utility updates the corresponding HAWQ metadata
for the files. Either AO for Parquet-formatted in HDFS can be loaded into a corresponding
table in HAWQ.
    +You can use `hawq register` either to:
    +-  Load and register external Parquet-formatted file data generated by an external system
such as Hive or Spark.
    +-  Recover cluster data from a backup cluster for disaster recovery. 
    +Requirements for running `hawq register` on the client server are:
    +-   Network access to and from all hosts in your HAWQ cluster (master and segments) and
the hosts where the data to be loaded is located.
    +-   The Hadoop client configured and the hdfs filepath specified.
    +-   The files to be registered and the HAWQ table must be located in the same HDFS cluster.
    +-   The target table DDL is configured with the correct data type mapping.
    +##Registering Externally Generated HDFS File Data to an Existing Table<a id="topic1__section2"></a>
    +Files or folders in HDFS can be registered into an existing table, allowing them to be
managed as a HAWQ internal table. When registering files, you can optionally specify the maximum
amount of data to be loaded, in bytes, using the `--eof` option. If registering a folder,
the actual file sizes are used. 
    +Only HAWQ or Hive-generated Parquet tables are supported. Partitioned tables are not
supported. Attempting to register these tables will result in an error.
    +Metadata for the Parquet file(s) and the destination table must be consistent. Different
 data types are used by HAWQ tables and Parquet files, so data must be mapped. You must verify
that the structure of the parquet files and the HAWQ table are compatible before running `hawq
    +We recommand creating a copy of the Parquet file to be registered before running ```hawq
    --- End diff --
    Change "We recommend creating" to "As a best practice, create"

If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.

View raw message