hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [3/7] incubator-hawq-docs git commit: Add register files link/info
Date Sat, 01 Oct 2016 00:26:16 GMT
Add register files link/info


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/aa65a9c5
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/aa65a9c5
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/aa65a9c5

Branch: refs/heads/master
Commit: aa65a9c5f7a14a5b02c838331f26e6db7c5e230e
Parents: cbc83e1
Author: Jane Beckman <jbeckman@pivotal.io>
Authored: Thu Sep 29 14:06:39 2016 -0700
Committer: Jane Beckman <jbeckman@pivotal.io>
Committed: Thu Sep 29 14:06:39 2016 -0700

----------------------------------------------------------------------
 .../load/g-loading-and-unloading-data.html.md   | 58 --------------------
 .../g-loading-and-unloading-data.html.md.erb    |  4 +-
 2 files changed, 3 insertions(+), 59 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/aa65a9c5/datamgmt/load/g-loading-and-unloading-data.html.md
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-loading-and-unloading-data.html.md b/datamgmt/load/g-loading-and-unloading-data.html.md
deleted file mode 100644
index 012e5b6..0000000
--- a/datamgmt/load/g-loading-and-unloading-data.html.md
+++ /dev/null
@@ -1,58 +0,0 @@
----
-title: Loading and Unloading Data
----
-
-The topics in this section describe methods for loading and writing data into and out of
HAWQ, and how to format data files. It also covers registering HDFS files and folders directly
into HAWQ internal tables.
-
-HAWQ supports high-performance parallel data loading and unloading, and for smaller amounts
of data, single file, non-parallel data import and export.
-
-HAWQ can read from and write to several types of external data sources, including text files,
Hadoop file systems, and web servers.
-
--   The `COPY` SQL command transfers data between an external text file on the master host
and a HAWQ database table.
--   External tables allow you to query data outside of the database directly and in parallel
using SQL commands such as `SELECT`, `JOIN`, or `SORT           EXTERNAL TABLE DATA`, and
you can create views for external tables. External tables are often used to load external
data into a regular database table using a command such as `CREATE TABLE table AS SELECT *
FROM ext_table`.
--   External web tables provide access to dynamic data. They can be backed with data from
URLs accessed using the HTTP protocol or by the output of an OS script running on one or more
segments.
--   The `gpfdist` utility is the HAWQ parallel file distribution program. It is an HTTP server
that is used with external tables to allow HAWQ segments to load external data in parallel,
from multiple file systems. You can run multiple instances of `gpfdist` on different hosts
and network interfaces and access them in parallel.
--   The `hawq load` utility automates the steps of a load task using a YAML-formatted control
file.
-
-The method you choose to load data depends on the characteristics of the source data—its
location, size, format, and any transformations required.
-
-In the simplest case, the `COPY` SQL command loads data into a table from a text file that
is accessible to the HAWQ master instance. This requires no setup and provides good performance
for smaller amounts of data. With the `COPY` command, the data copied into or out of the database
passes between a single file on the master host and the database. This limits the total size
of the dataset to the capacity of the file system where the external file resides and limits
the data transfer to a single file write stream.
-
-More efficient data loading options for large datasets take advantage of the HAWQ MPP architecture,
using the HAWQ segments to load data in parallel. These methods allow data to load simultaneously
from multiple file systems, through multiple NICs, on multiple hosts, achieving very high
data transfer rates. External tables allow you to access external files from within the database
as if they are regular database tables. When used with `gpfdist`, the HAWQ parallel file distribution
program, external tables provide full parallelism by using the resources of all HAWQ segments
to load or unload data.
-
-The `hawq register` utility allows you to:
-
--  Load and register file data generated by an external system such as Hive or Spark into
HAWQ internal tables.
--  Recover cluster data from a backup cluster for disaster recovery, using a YAML file.
-
-HAWQ leverages the parallel architecture of the Hadoop Distributed File System (HDFS) to
access files on that system.
-
--   **[Working with File-Based External Tables](../../datamgmt/load/g-working-with-file-based-ext-tables.html)**
-
--   **[Using the Greenplum Parallel File Server (gpfdist)](../../datamgmt/load/g-using-the-greenplum-parallel-file-server--gpfdist-.html)**
-
--   **[Creating and Using Web External Tables](../../datamgmt/load/g-creating-and-using-web-external-tables.html)**
-
--   **[Loading Data Using an External Table](../../datamgmt/load/g-loading-data-using-an-external-table.html)**
-
--   **[Loading and Writing Non-HDFS Custom Data](../../datamgmt/load/g-loading-and-writing-non-hdfs-custom-data.html)**
-
--   **[Creating External Tables - Examples](../../datamgmt/load/creating-external-tables-examples.html#topic44)**
-
--   **[Handling Load Errors](../../datamgmt/load/g-handling-load-errors.html)**
-
--   **[Loading Data with hawq load](../../datamgmt/load/g-loading-data-with-hawqload.html)**
-
--   **[Loading Data with COPY](../../datamgmt/load/g-loading-data-with-copy.html)**
-
--   **[Running COPY in Single Row Error Isolation Mode](../../datamgmt/load/g-running-copy-in-single-row-error-isolation-mode.html)**
-
--   **[Optimizing Data Load and Query Performance](../../datamgmt/load/g-optimizing-data-load-and-query-performance.html)**
-
--   **[Unloading Data from HAWQ](../../datamgmt/load/g-unloading-data-from-greenplum-database.html)**
-
--   **[Transforming XML Data](../../datamgmt/load/g-transforming-xml-data.html)**
-
--   **[Formatting Data Files](../../datamgmt/load/g-formatting-data-files.html)**
-
-

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/aa65a9c5/datamgmt/load/g-loading-and-unloading-data.html.md.erb
----------------------------------------------------------------------
diff --git a/datamgmt/load/g-loading-and-unloading-data.html.md.erb b/datamgmt/load/g-loading-and-unloading-data.html.md.erb
index cbcab0b..6d27685 100644
--- a/datamgmt/load/g-loading-and-unloading-data.html.md.erb
+++ b/datamgmt/load/g-loading-and-unloading-data.html.md.erb
@@ -2,7 +2,7 @@
 title: Loading and Unloading Data
 ---
 
-The topics in this section describe methods for loading and writing data into and out of
HAWQ, and how to format data files.
+The topics in this section describe methods for loading and writing data into and out of
HAWQ, and how to format data files. It also covers registering HDFS files and folders directly
into HAWQ internal tables.
 
 HAWQ supports high-performance parallel data loading and unloading, and for smaller amounts
of data, single file, non-parallel data import and export.
 
@@ -30,6 +30,8 @@ HAWQ leverages the parallel architecture of the Hadoop Distributed File
System t
 
 -   **[Loading Data Using an External Table](../../datamgmt/load/g-loading-data-using-an-external-table.html)**
 
+-   **[Registering Files into HAWQ Internal Tables](../../datamgmt/load/g-register_files.html)**
+
 -   **[Loading and Writing Non-HDFS Custom Data](../../datamgmt/load/g-loading-and-writing-non-hdfs-custom-data.html)**
 
 -   **[Creating External Tables - Examples](../../datamgmt/load/creating-external-tables-examples.html#topic44)**


Mime
View raw message