hawq-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yo...@apache.org
Subject [21/50] incubator-hawq-docs git commit: make references DataNode consistent
Date Mon, 31 Oct 2016 22:13:31 GMT
make references DataNode consistent


Project: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/repo
Commit: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/commit/00a2a368
Tree: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/tree/00a2a368
Diff: http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/diff/00a2a368

Branch: refs/heads/tutorial-proto
Commit: 00a2a3684b9074a11f720c72be61fd1672d5aa1f
Parents: 86ef700
Author: Lisa Owen <lowen@pivotal.io>
Authored: Thu Oct 20 10:59:58 2016 -0700
Committer: Lisa Owen <lowen@pivotal.io>
Committed: Thu Oct 20 10:59:58 2016 -0700

----------------------------------------------------------------------
 ddl/ddl-table.html.md.erb                                 | 2 +-
 install/aws-config.html.md.erb                            | 2 +-
 install/select-hosts.html.md.erb                          | 4 ++--
 overview/TableDistributionStorage.html.md.erb             | 2 +-
 pxf/TroubleshootingPXF.html.md.erb                        | 4 ++--
 query/query-performance.html.md.erb                       | 2 +-
 reference/HDFSConfigurationParameterReference.html.md.erb | 6 +++---
 7 files changed, 11 insertions(+), 11 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/ddl/ddl-table.html.md.erb
----------------------------------------------------------------------
diff --git a/ddl/ddl-table.html.md.erb b/ddl/ddl-table.html.md.erb
index 62ece36..d0220d7 100644
--- a/ddl/ddl-table.html.md.erb
+++ b/ddl/ddl-table.html.md.erb
@@ -66,7 +66,7 @@ Foreign key constraints specify that the values in a column or a group of
column
 
 All HAWQ tables are distributed. The default is `DISTRIBUTED RANDOMLY` \(round-robin distribution\)
to determine the table row distribution. However, when you create or alter a table, you can
optionally specify `DISTRIBUTED BY` to distribute data according to a hash-based policy. In
this case, the `bucketnum` attribute sets the number of hash buckets used by a hash-distributed
table. Columns of geometric or user-defined data types are not eligible as HAWQ distribution
key columns. 
 
-Randomly distributed tables have benefits over hash distributed tables. For example, after
expansion, HAWQ's elasticity feature lets it automatically use more resources without needing
to redistribute the data. For extremely large tables, redistribution is very expensive. Also,
data locality for randomly distributed tables is better, especially after the underlying HDFS
redistributes its data during rebalancing or because of data node failures. This is quite
common when the cluster is large.
+Randomly distributed tables have benefits over hash distributed tables. For example, after
expansion, HAWQ's elasticity feature lets it automatically use more resources without needing
to redistribute the data. For extremely large tables, redistribution is very expensive. Also,
data locality for randomly distributed tables is better, especially after the underlying HDFS
redistributes its data during rebalancing or because of DataNode failures. This is quite common
when the cluster is large.
 
 However, hash distributed tables can be faster than randomly distributed tables. For example,
for TPCH queries, where there are several queries, HASH distributed tables can have performance
benefits. Choose a distribution policy that best suits your application scenario. When you
`CREATE TABLE`, you can also specify the `bucketnum` option. The `bucketnum` determines the
number of hash buckets used in creating a hash-distributed table or for PXF external table
intermediate processing. The number of buckets also affects how many virtual segments will
be created when processing this data. The bucketnumber of a gpfdist external table is the
number of gpfdist location, and the bucketnumber of a command external table is `ON #num`.
PXF external tables use the `default_hash_table_bucket_number` parameter to control virtual
segments. 
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/install/aws-config.html.md.erb
----------------------------------------------------------------------
diff --git a/install/aws-config.html.md.erb b/install/aws-config.html.md.erb
index e4106b1..21cadf5 100644
--- a/install/aws-config.html.md.erb
+++ b/install/aws-config.html.md.erb
@@ -34,7 +34,7 @@ Virtual devices for instance store volumes for HAWQ EC2 instance store instances
 
 A placement group is a logical grouping of instances within a single availability zone that
together participate in a low-latency, 10 Gbps network.  Your HAWQ master and segment cluster
instances should support enhanced networking and reside in a single placement group (and subnet)
for optimal network performance.  
 
-If your Ambari node is not a data node, locating the Ambari node instance in a subnet separate
from the HAWQ master/segment placement group enables you to manage multiple HAWQ clusters
from the single Ambari instance.
+If your Ambari node is not a DataNode, locating the Ambari node instance in a subnet separate
from the HAWQ master/segment placement group enables you to manage multiple HAWQ clusters
from the single Ambari instance.
 
 Amazon recommends that you use the same instance type for all instances in the placement
group and that you launch all instances within the placement group at the same time.
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/install/select-hosts.html.md.erb
----------------------------------------------------------------------
diff --git a/install/select-hosts.html.md.erb b/install/select-hosts.html.md.erb
index c49f184..c2fbdff 100644
--- a/install/select-hosts.html.md.erb
+++ b/install/select-hosts.html.md.erb
@@ -8,10 +8,10 @@ Complete this procedure for all HAWQ deployments:
 
 1.  **Choose the host machines that will host a HAWQ segment.** Keep in mind these restrictions
and requirements:
     -   Each host must meet the system requirements for the version of HAWQ you are installing.
-    -   Each HAWQ segment must be co-located on a host that runs an HDFS data node.
+    -   Each HAWQ segment must be co-located on a host that runs an HDFS DataNode.
     -   The HAWQ master segment and standby master segment must be hosted on separate machines.
 2.  **Choose the host machines that will run PXF.** Keep in mind these restrictions and requirements:
-    -   PXF must be installed on the HDFS NameNode *and* on all HDFS data nodes.
+    -   PXF must be installed on the HDFS NameNode *and* on all HDFS DataNodes.
     -   If you have configured Hadoop with high availability, PXF must also be installed
on all HDFS nodes including all NameNode services.
     -   If you want to use PXF with HBase or Hive, you must first install the HBase client
\(hbase-client\) and/or Hive client \(hive-client\) on each machine where you intend to install
PXF. See the [HDP installation documentation](http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.3.4/index.html)
for more information.
 3.  **Verify that required ports on all machines are unused.** By default, a HAWQ master
or standby master service configuration uses port 5432. Hosts that run other PostgreSQL instances
cannot be used to run a default HAWQ master or standby service configuration because the default
PostgreSQL port \(5432\) conflicts with the default HAWQ port. You must either change the
default port configuration of the running PostgreSQL instance or change the HAWQ master port
setting during the HAWQ service installation to avoid port conflicts.

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/overview/TableDistributionStorage.html.md.erb
----------------------------------------------------------------------
diff --git a/overview/TableDistributionStorage.html.md.erb b/overview/TableDistributionStorage.html.md.erb
index aa03b59..58f20f2 100755
--- a/overview/TableDistributionStorage.html.md.erb
+++ b/overview/TableDistributionStorage.html.md.erb
@@ -12,7 +12,7 @@ For all HAWQ table storage formats, AO \(Append-Only\) and Parquet, the
data fil
 
 The default table distribution policy in HAWQ is random.
 
-Randomly distributed tables have some benefits over hash distributed tables. For example,
after cluster expansion, HAWQ can use more resources automatically without redistributing
the data. For huge tables, redistribution is very expensive, and data locality for randomly
distributed tables is better after the underlying HDFS redistributes its data during rebalance
or data node failures. This is quite common when the cluster is large.
+Randomly distributed tables have some benefits over hash distributed tables. For example,
after cluster expansion, HAWQ can use more resources automatically without redistributing
the data. For huge tables, redistribution is very expensive, and data locality for randomly
distributed tables is better after the underlying HDFS redistributes its data during rebalance
or DataNode failures. This is quite common when the cluster is large.
 
 On the other hand, for some queries, hash distributed tables are faster than randomly distributed
tables. For example, hash distributed tables have some performance benefits for some TPC-H
queries. You should choose the distribution policy that is best suited for your application's
scenario.
 

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/pxf/TroubleshootingPXF.html.md.erb
----------------------------------------------------------------------
diff --git a/pxf/TroubleshootingPXF.html.md.erb b/pxf/TroubleshootingPXF.html.md.erb
index 7b53065..d59e361 100644
--- a/pxf/TroubleshootingPXF.html.md.erb
+++ b/pxf/TroubleshootingPXF.html.md.erb
@@ -49,8 +49,8 @@ The following table lists some common errors encountered while using PXF:
 <td>Cannot find PXF Jar</td>
 </tr>
 <tr class="even">
-<td>ERROR:  PXF API encountered a HTTP 404 error. Either the PXF service (tomcat)
on data node was not started or PXF webapp was not started.</td>
-<td>Either the required data node does not exist or PXF service (tcServer) on data
node is not started or PXF webapp was not started</td>
+<td>ERROR:  PXF API encountered a HTTP 404 error. Either the PXF service (tomcat)
on the DataNode was not started or the PXF webapp was not started.</td>
+<td>Either the required DataNode does not exist or PXF service (tcServer) on the DataNode
is not started or PXF webapp was not started</td>
 </tr>
 <tr class="odd">
 <td>ERROR:  remote component error (500) from '&lt;x&gt;':  type  Exception
report   message   java.lang.NoClassDefFoundError: org/apache/hadoop/hbase/client/HTableInterface</td>

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/query/query-performance.html.md.erb
----------------------------------------------------------------------
diff --git a/query/query-performance.html.md.erb b/query/query-performance.html.md.erb
index 4515575..b4f88fe 100644
--- a/query/query-performance.html.md.erb
+++ b/query/query-performance.html.md.erb
@@ -99,7 +99,7 @@ The following table describes the metrics related to data locality. Use
these me
 </tr>
 <tr class="odd">
 <td>continuity</td>
-<td>reading a HDFS file discontinuously will introduce additional seek, which will
slow the table scan of a query. A low value of continuity indicates that the blocks of a file
are not continuously distributed on a datanode.</td>
+<td>reading a HDFS file discontinuously will introduce additional seek, which will
slow the table scan of a query. A low value of continuity indicates that the blocks of a file
are not continuously distributed on a DataNode.</td>
 </tr>
 <tr class="even">
 <td>DFS metadatacache</td>

http://git-wip-us.apache.org/repos/asf/incubator-hawq-docs/blob/00a2a368/reference/HDFSConfigurationParameterReference.html.md.erb
----------------------------------------------------------------------
diff --git a/reference/HDFSConfigurationParameterReference.html.md.erb b/reference/HDFSConfigurationParameterReference.html.md.erb
index 8199de2..aef4ed2 100644
--- a/reference/HDFSConfigurationParameterReference.html.md.erb
+++ b/reference/HDFSConfigurationParameterReference.html.md.erb
@@ -13,13 +13,13 @@ This table describes the configuration parameters and values that are
recommende
 | Parameter                                 | Description                               
                                                                                         
                                                                              | Recommended
Value for HAWQ Installs                                   | Comments                     
                                                                                         
                                               |
 |-------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
 | `dfs.allow.truncate`                      | Allows truncate.                          
                                                                                         
                                                                              | true     
                                                            | HAWQ requires that you enable
`dfs.allow.truncate`. The HAWQ service will fail to start if `dfs.allow.truncate` is not set
to `true`.                                  |
-| `dfs.block.access.token.enable`           | If `true`, access tokens are used as capabilities
for accessing datanodes. If `false`, no access tokens are checked on accessing datanodes.
                                                                       | *false* for an unsecured
HDFS cluster, or *true* for a secure cluster |                                          
                                                                                         
                                   |
+| `dfs.block.access.token.enable`           | If `true`, access tokens are used as capabilities
for accessing DataNodes. If `false`, no access tokens are checked on accessing DataNodes.
                                                                       | *false* for an unsecured
HDFS cluster, or *true* for a secure cluster |                                          
                                                                                         
                                   |
 | `dfs.block.local-path-access.user`        | Comma separated list of the users allowed to
open block files on legacy short-circuit local read.                                     
                                                                            | gpadmin    
                                                          |                             
                                                                                         
                                                |
 | `dfs.client.read.shortcircuit`            | This configuration parameter turns on short-circuit
local reads.                                                                             
                                                                     | true              
                                                   | In Ambari, this parameter corresponds
to **HDFS Short-circuit read**. The value for this parameter should be the same in `hdfs-site.xml`
and HAWQ's `hdfs-client.xml`. |
 | `dfs.client.socket-timeout`               | The amount of time before a client connection
times out when establishing a connection or reading. The value is expressed in milliseconds.
                                                                        | 300000000      
                                                      |                                 
                                                                                         
                                            |
 | `dfs.client.use.legacy.blockreader.local` | Setting this value to false specifies that
the new version of the short-circuit reader is used. Setting this value to true means that
the legacy short-circuit reader would be used.                               | false     
                                                           |                            
                                                                                         
                                                 |
-| `dfs.datanode.data.dir.perm`              | Permissions for the directories on on the local
filesystem where the DFS data node store its blocks. The permissions can either be octal or
symbolic.                                                              | 750             
                                                     | In Ambari, this parameter corresponds
to **DataNode directories permission**                                                   
                                       |
-| `dfs.datanode.handler.count`              | The number of server threads for the datanode.
                                                                                         
                                                                          | 60           
                                                        |                               
                                                                                         
                                              |
+| `dfs.datanode.data.dir.perm`              | Permissions for the directories on on the local
filesystem where the DFS DataNode stores its blocks. The permissions can either be octal or
symbolic.                                                              | 750             
                                                     | In Ambari, this parameter corresponds
to **DataNode directories permission**                                                   
                                       |
+| `dfs.datanode.handler.count`              | The number of server threads for the DataNode.
                                                                                         
                                                                          | 60           
                                                        |                               
                                                                                         
                                              |
 | `dfs.datanode.max.transfer.threads`       | Specifies the maximum number of threads to
use for transferring data in and out of the DataNode.                                    
                                                                              | 40960    
                                                            | In Ambari, this parameter corresponds
to **DataNode max data transfer threads**                                                
                                       |
 | `dfs.datanode.socket.write.timeout`       | The amount of time before a write operation
times out, expressed in milliseconds.                                                    
                                                                             | 7200000   
                                                           |                            
                                                                                         
                                                 |
 | `dfs.domain.socket.path`                  | (Optional.) The path to a UNIX domain socket
to use for communication between the DataNode and local HDFS clients. If the string "\_PORT"
is present in this path, it is replaced by the TCP port of the DataNode. |              
                                                        | If set, the value for this parameter
should be the same in `hdfs-site.xml` and HAWQ's `hdfs-client.xml`.                      
                                        |


Mime
View raw message