hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Jin <y...@pivotal.io>
Subject Re: Question on hawq_rm_nvseg_perquery_limit
Date Wed, 13 Jul 2016 05:40:34 GMT
Hi Vineet,

Some my comment.

For question 1.
perquery_limit is introduced mainly for restrict resource usage in large
scale cluster; perquery_perseg_limit is to avoid allocating too many
processes in one segment, which may cause serious performance issue. So,
two gucs are for different performance aspects. Along with the variation of
cluster scale, one of the two limits actually takes effect. We dont have to
let both active for resource allocation.

For question 2.

In fact, perquery_perseg_limit is a general resource restriction for all
queries not only hash table queries and external table queries, this is why
this guc is not merged with another one. For example, when we run some
queries upon random distributed tables, it does not make sense to let
resource manager refer a guc for hash table.

For the last topic item.

In my opinion, it is not necessary to adjust hawq_rm_nvseg_perquery_limit,
say, we just need to leave it unchanged and actually not active until we
really want to run a large-scale HAWQ cluster, for example, 100+ nodes.


On Wed, Jul 13, 2016 at 1:18 PM, Vineet Goel <vvineet@apache.org> wrote:

> Hi all,
> I’m trying to document some GUC usage in detail and have questions on
> hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit
> tuning.
> *hawq_rm_nvseg_perquery_limit* = (default value = 512) . Let’s call it
> *perquery_limit* in short.
> *hawq_rm_nvseg_perquery_perseg_limit* (default value = 6) . Let’s call it
> *perquery_perseg_limit* in short.
> 1) Is there ever any benefit in having perquery_limit *greater than*
> (perquery_perseg_limit * segment host count) ?
> For example in a 10-node cluster, HAWQ will never allocate more than (GUC
> default 6 * 10 =) 60 v-segs, so the perquery_limit default of 512 doesn’t
> have any effect. It seems perquery_limit overrides (takes effect)
> perquery_perseg_limit only when it’s value is less than
> (perquery_perseg_limit * segment host count).
> Is that the correct assumption? That would make sense, as users may want to
> keep a check on how much processing a single query can take up (that
> implies that the limit must be lower than the total possible v-segs). Or,
> it may make sense in large clusters (100-nodes or more) where we need to
> limit the pressure on HDFS.
> 2) Now, if the purpose of hawq_rm_nvseg_perquery_limit is to keep a check
> on single query resource usage (by limiting the # of v-segs), doesn’t if
> affect default_hash_table_bucket_number because queries will fail when
> *default_hash_table_bucket_number* is greater than
> hawq_rm_nvseg_perquery_limit ? In that case, the purpose of
> hawq_rm_nvseg_perquery_limit conflicts with the ability to run queries on
> HASH dist tables. This then means that tuning hawq_rm_nvseg_perquery_limit
> down is not a good idea, which seems conflicting to the purpose of the GUC
> (in relation to other GUC).
> Perhaps someone can provide some examples on *how and when would you
> tune hawq_rm_nvseg_perquery_limit* in this 10-node example:
> *Defaults on a 10-node cluster are:*
> a) *hawq_rm_nvseg_perquery_perseg_limit* = 6 (hence ability to spin up 6 *
> 10 = 60 total v-segs for random tables)
> b) *hawq_rm_nvseg_perquery_limit* = 512 (but HAWQ will never dispatch more
> than 60 v-segs on random table, so value of 512 does not seem practical)
> c) *default_hash_table_bucket_number* = 60 (6 * 10)
> Thanks
> Vineet

  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message