hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject incubator-hawq-docs git commit: This closes #59 - Revisions to HAWQ Best Practices topics.
Date Tue, 15 Nov 2016 23:59:56 GMT
Repository: incubator-hawq-docs
Updated Branches:
  refs/heads/develop 740b6ee69 -> 9f4293ba4


This closes #59 - Revisions to HAWQ Best Practices topics.


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/9f4293ba
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/9f4293ba
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/9f4293ba

Branch: refs/heads/develop
Commit: 9f4293ba40edad95b1eca1d9dfe04f22d3208afa
Parents: 740b6ee
Author: David Yozie <yozie@apache.org>
Authored: Tue Nov 15 15:59:09 2016 -0800
Committer: David Yozie <yozie@apache.org>
Committed: Tue Nov 15 15:59:09 2016 -0800

----------------------------------------------------------------------
 .../HAWQBestPracticesOverview.html.md.erb       |  3 ---
 .../operating_hawq_bestpractices.html.md.erb    | 13 ++++++++--
 .../querying_data_bestpractices.html.md.erb     | 24 +++++++++++++++---
 query/query-performance.html.md.erb             | 26 ++++++++++++++------
 4 files changed, 50 insertions(+), 16 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/HAWQBestPracticesOverview.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/HAWQBestPracticesOverview.html.md.erb b/bestpractices/HAWQBestPracticesOverview.html.md.erb
index 6277727..13b4dca 100644
--- a/bestpractices/HAWQBestPracticesOverview.html.md.erb
+++ b/bestpractices/HAWQBestPracticesOverview.html.md.erb
@@ -4,9 +4,6 @@ title: Best Practices
 
 This chapter provides best practices on using the components and features that are part of
a HAWQ system.
 
--   **[HAWQ Best Practices](../bestpractices/general_bestpractices.html)**
-
-    This topic addresses general best practices for using HAWQ.
 
 -   **[Best Practices for Operating HAWQ](../bestpractices/operating_hawq_bestpractices.html)**
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/operating_hawq_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/operating_hawq_bestpractices.html.md.erb b/bestpractices/operating_hawq_bestpractices.html.md.erb
index d48cf82..9dc56e9 100644
--- a/bestpractices/operating_hawq_bestpractices.html.md.erb
+++ b/bestpractices/operating_hawq_bestpractices.html.md.erb
@@ -4,6 +4,16 @@ title: Best Practices for Operating HAWQ
 
 This topic provides best practices for operating HAWQ, including recommendations for stopping,
starting and monitoring HAWQ.
 
+## <a id="best_practice_config"></a>Best Practices for Configuring HAWQ Parameters
+
+The HAWQ configuration guc/parameters are located in `$GPHOME/etc/hawq-site.xml`. This configuration
file resides on all HAWQ instances and can be modified either by the Ambari interface or the
command line. 
+
+If you install and manage HAWQ using Ambari, use the Ambari interface for all configuration
changes. Do not use command line utilities such as `hawq config` to set or change HAWQ configuration
properties for Ambari-managed clusters. Configuration changes to `hawq-site.xml` made outside
the Ambari interface will be overwritten when you restart or reconfigure HAWQ using Ambari.
+
+If you manage your cluster using command line tools instead of Ambari, use a consistent `hawq-site.xml`
file to configure your entire cluster. 
+
+**Note:** While `postgresql.conf` still exists in HAWQ, any parameters defined in `hawq-site.xml`
will overwrite configurations in `postgresql.conf`. For this reason, we recommend that you
only use `hawq-site.xml` to configure your HAWQ cluster. For Ambari clusters, always use Ambari
for configuring `hawq-site.xml` parameters.
+
 ## <a id="task_qgk_bz3_1v"></a>Best Practices to Start/Stop HAWQ Cluster Members
 
 For best results in using `hawq start` and `hawq stop` to manage your HAWQ system, the following
best practices are recommended.
@@ -85,7 +95,6 @@ WHERE status &lt;&gt; &#39;u&#39;;</code></pre></td>
 <ol>
 <li>Verify that the hosts with down segments are responsive.</li>
 <li>If hosts are OK, check the <span class="ph filepath">pg_log</span>
files for the down segments to discover the root cause of the segments going down.</li>
-<li>If no unexpected errors are found, run the <code class="ph codeph">gprecoverseg</code>
utility to bring the segments back online.</li>
 </ol></td>
 </tr>
 </tbody>
@@ -116,7 +125,7 @@ WHERE status &lt;&gt; &#39;u&#39;;</code></pre></td>
 <p>Recommended frequency: real-time, if possible, or every 15 minutes</p>
 <p>Severity: CRITICAL</p></td>
 <td>Set up system check for hardware and OS errors.</td>
-<td>If required, remove a machine from the HAWQ cluster to resolve hardware and OS
issues, then, after add it back to the cluster and run <code class="ph codeph">gprecoverseg</code>.</td>
+<td>If required, remove a machine from the HAWQ cluster to resolve hardware and OS
issues, then add it back to the cluster after the issues are resolved.</td>
 </tr>
 <tr class="even">
 <td>Check disk space usage on volumes used for HAWQ data storage and the OS.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/bestpractices/querying_data_bestpractices.html.md.erb
----------------------------------------------------------------------
diff --git a/bestpractices/querying_data_bestpractices.html.md.erb b/bestpractices/querying_data_bestpractices.html.md.erb
index e2fb983..3efe569 100644
--- a/bestpractices/querying_data_bestpractices.html.md.erb
+++ b/bestpractices/querying_data_bestpractices.html.md.erb
@@ -4,6 +4,25 @@ title: Best Practices for Querying Data
 
 To obtain the best results when querying data in HAWQ, review the best practices described
in this topic.
 
+## <a id="virtual_seg_performance"></a>Factors Impacting Query Performance
+
+The number of virtual segments used for a query directly impacts the query's performance.
The following factors can impact the degree of parallelism of a query:
+
+-   **Cost of the query**. Small queries use fewer segments and larger queries use more segments.
Some techniques used in defining resource queues can influence the number of both virtual
segments and general resources allocated to queries. For more information, see [Best Practices
for Using Resource Queues](managing_resources_bestpractices.html#topic_hvd_pls_wv).
+-   **Available resources at query time**. If more resources are available in the resource
queue, those resources will be used.
+-   **Hash table and bucket number**. If the query involves only hash-distributed tables,
the query's parallelism is fixed (equal to the hash table bucket number) under the following
conditions: 
+ 
+  	- The bucket number (bucketnum) configured for all the hash tables is the same for all
tables 
+   - The table size for random tables is no more than 1.5 times the size allotted for the
hash tables. 
+
+  Otherwise, the number of virtual segments depends on the query's cost: hash-distributed
table queries behave like queries on randomly-distributed tables.
+  
+-   **Query Type**: It can be difficult to calculate  resource costs for queries with some
user-defined functions or for queries to external tables. With these queries,  the number
of virtual segments is controlled by the  `hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit`
parameters, as well as by the ON clause and the location list of external tables. If the query
has a hash result table (e.g. `INSERT into hash_table`), the number of virtual segments must
be equal to the bucket number of the resulting hash table. If the query is performed in utility
mode, such as for `COPY` and `ANALYZE` operations, the virtual segment number is calculated
by different policies.
+
+  ***Note:*** PXF external tables use the `default_hash_table_bucket_number` parameter, not
the `hawq_rm_nvseg_perquery_perseg_limit` parameter, to control the number of virtual segments.
+
+See [Query Performance](../query/query-performance.html#topic38) for more details.
+
 ## <a id="id_xtk_jmq_1v"></a>Examining Query Plans to Solve Problems
 
 If a query performs poorly, examine its query plan and ask the following questions:
@@ -20,8 +39,5 @@ If a query performs poorly, examine its query plan and ask the following
questio
 
     `Work_mem used: 23430K bytes avg, 23430K bytes max (seg0). Work_mem wanted: 33649K bytes
avg, 33649K bytes max (seg0) to lessen workfile I/O affecting 2               workers.`
 
-The "bytes wanted" (Work_mem) message from `EXPLAIN ANALYZE` is based on the amount of data
written to work files and is not exact.
-
-**Note**
-The *work\_mem* property is not configurable. Use resource queues to manage memory use. For
more information on resource queues, see [Configuring Resource Management](../resourcemgmt/ConfigureResourceManagement.html)
and [Working with Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html).
+  **Note:** The "bytes wanted" (*work\_mem* property) is based on the amount of data written
to work files and is not exact. This property is not configurable. Use resource queues to
manage memory use. For more information on resource queues, see [Configuring Resource Management](../resourcemgmt/ConfigureResourceManagement.html)
and [Working with Hierarchical Resource Queues](../resourcemgmt/ResourceQueues.html).
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/9f4293ba/query/query-performance.html.md.erb
----------------------------------------------------------------------
diff --git a/query/query-performance.html.md.erb b/query/query-performance.html.md.erb
index e3aa8f7..981d77b 100644
--- a/query/query-performance.html.md.erb
+++ b/query/query-performance.html.md.erb
@@ -118,18 +118,30 @@ The following table describes the metrics related to data locality.
Use these me
 
 ## <a id="topic_wv3_gzc_d5"></a>Number of Virtual Segments
 
-The number of virtual segment used has impacts on the query performance. HAWQ decides the
number of virtual segments of a query (its parallelism) by using the following rules:
+To obtain the best results when querying data in HAWQ, review the best practices described
in this topic.
 
--   **Cost of the query**. Small queries use fewer segments and larger queries use more segments.
Note that there are some techniques you can use when defining resource queues to influence
the number of virtual segments and general resources that are allocated to queries. See [Best
Practices for Using Resource Queues](../bestpractices/managing_resources_bestpractices.html#topic_hvd_pls_wv).
--   **Available resources**. Resources available at query time. If more resources are available
in the resource queue, the resources will be used.
--   **Hash table and bucket number**. If the query involves only hash-distributed tables,
and the bucket number (bucketnum) configured for all the hash tables is either the same bucket
number for all tables or the table size for random tables is no more than 1.5 times larger
than the size of hash tables for the hash tables, then the query's parallelism is fixed (equal
to the hash table bucket number). Otherwise, the number of virtual segments depends on the
query's cost and hash-distributed table queries will behave like queries on randomly distributed
tables.
--   **Query Type**: For queries with some user-defined functions or for external tables where
calculating resource costs is difficult , then the number of virtual segments is controlled
by `hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit` parameters, as
well as by the ON clause and the location list of external tables. If the query has a hash
result table (e.g. `INSERT into hash_table`) then the number of virtual segment number must
be equal to the bucket number of the resulting hash table, If the query is performed in utility
mode, such as for `COPY` and `ANALYZE` operations, the virtual segment number is calculated
by different policies, which will be explained later in this section.
+### <a id="virtual_seg_performance"></a>Factors Impacting Query Performance
 
-The following are guidelines for numbers of virtual segments to use, provided there are sufficient
resources available.
+The number of virtual segments used for a query directly impacts the query's performance.
The following factors can impact the degree of parallelism of a query:
+
+-   **Cost of the query**. Small queries use fewer segments and larger queries use more segments.
Some techniques used in defining resource queues can influence the number of both virtual
segments and general resources allocated to queries.
+-   **Available resources at query time**. If more resources are available in the resource
queue, those resources will be used.
+-   **Hash table and bucket number**. If the query involves only hash-distributed tables,
the query's parallelism is fixed (equal to the hash table bucket number) under the following
conditions:
+
+   - The bucket number (bucketnum) configured for all the hash tables is the same bucket
number
+   - The table size for random tables is no more than 1.5 times the size allotted for the
hash tables.
+
+  Otherwise, the number of virtual segments depends on the query's cost: hash-distributed
table queries behave like queries on randomly-distributed tables.
+
+-   **Query Type**: It can be difficult to calculate  resource costs for queries with some
user-defined functions or for queries to external tables. With these queries,  the number
of virtual segments is controlled by the  `hawq_rm_nvseg_perquery_limit `and `hawq_rm_nvseg_perquery_perseg_limit`
parameters, as well as by the ON clause and the location list of external tables. If the query
has a hash result table (e.g. `INSERT into hash_table`), the number of virtual segments must
be equal to the bucket number of the resulting hash table. If the query is performed in utility
mode, such as for `COPY` and `ANALYZE` operations, the virtual segment number is calculated
by different policies.
+
+###General Guidelines
+
+The following guidelines expand on the numbers of virtual segments to use, provided there
are sufficient resources available.
 
 -   **Random tables exist in the select list:** \#vseg (number of virtual segments) depends
on the size of the table.
 -   **Hash tables exist in the select list:** \#vseg depends on the bucket number of the
table.
--   **Random and hash tables both exist in the select list:** \#vseg depends on the bucket
number of the table, if the table size of random tables is no more than 1.5 times larger than
the size of hash tables. Otherwise, \#vseg depends on the size of the random table.
+-   **Random and hash tables both exist in the select list:** \#vseg depends on the bucket
number of the table, if the table size of random tables is no more than 1.5 times the size
of hash tables. Otherwise, \#vseg depends on the size of the random table.
 -   **User-defined functions exist:** \#vseg depends on the `hawq_rm_nvseg_perquery_limit`
and `hawq_rm_nvseg_perquery_perseg_limit` parameters.
 -   **PXF external tables exist:** \#vseg depends on the `default_hash_table_bucket_number`
parameter.
 -   **gpfdist external tables exist:** \#vseg is at least the number of locations in the
location list.


Mime
View raw message