hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From dyozie <...@git.apache.org>
Subject [GitHub] incubator-hawq-docs pull request #60: HAWQ-1151 - add ambari procedures
Date Tue, 15 Nov 2016 00:17:15 GMT
Github user dyozie commented on a diff in the pull request:

    https://github.com/apache/incubator-hawq-docs/pull/60#discussion_r87922081
  
    --- Diff: ddl/ddl-table.html.md.erb ---
    @@ -93,14 +93,14 @@ For any specific query, the first four factors are fixed values, while
the confi
     
     The `bucketnum` for a hash table specifies the number of hash buckets to be used in creating
virtual segments. A HASH distributed table is created with `default_hash_table_bucket_number`
buckets. The default bucket value can be changed in session level or in the `CREATE TABLE`
DDL by using the `bucketnum` storage parameter.
     
    -When initializing a cluster, you can use the `hawq init --bucket_number` parameter to
explcitly set the default bucket number \(`default_hash_table_bucket_number`\).
    +In an Ambari-managed HAWQ cluster, the default bucket number \(`default_hash_table_bucket_number`\)
is derived from the number of segment nodes. In command-line-managed HAWQ environments, you
can use the `--bucket_number` option of `hawq init` to explicitly set `default_hash_table_bucket_number`
during cluster initialization.
     
    -**Note:** For best performance with large tables, the number of buckets should not exceed
the value of the `default_hash_table_bucket_number` parameter. Small tables can use one segment
node, `with bucketnum=1`. For larger tables, the bucketnum is set to a multiple of the number
of segment nodes, for the best load balancing on different segment nodes. The elastic runtime
will attempt to find the optimal number of buckets for the number of nodes being processed.
Larger tables need more virtual segments , and hence use larger numbers of buckets.
    +**Note:** For best performance with large tables, the number of buckets should not exceed
the value of the `default_hash_table_bucket_number` parameter. Small tables can use one segment
node, `WITH bucketnum=1`. For larger tables, the `bucketnum` is set to a multiple of the number
of segment nodes, for the best load balancing on different segment nodes. The elastic runtime
will attempt to find the optimal number of buckets for the number of nodes being processed.
Larger tables need more virtual segments , and hence use larger numbers of buckets.
    --- End diff --
    
    Might as well fix (remove) the space before the comma here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

Mime
View raw message