hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hubert Zhang <hzh...@pivotal.io>
Subject Re: Question on hawq_rm_nvseg_perquery_limit
Date Wed, 13 Jul 2016 06:51:36 GMT
+1 with Yi's answer.
Vseg numbers are controlled by Resource Negotiator(a module before
planner),  all the vseg related GUCs will affect the behaviour of RN, some
of them will also affect Resource Manager.
To be specific, hawq_rm_nvseg_perquery_limit and
hawq_rm_nvseg_perquery_perseg_limit
are both considered by Resource Negotiator(RN) and Resource Manager(RM),
while default_hash_table_bucket_number is only considered by RN.
As a result, suppose default_hash_table_bucket_number  = 60, query like
"select * from hash_table" will request #60 vsegs in RN and if
hawq_rm_nvseg_perquery_limit
is less than 60, RM will not able to allocate 60 vsegs.

So we need to ensure default_hash_table_bucket_number is less than the
capacity of RM.

On Wed, Jul 13, 2016 at 1:40 PM, Yi Jin <yjin@pivotal.io> wrote:

> Hi Vineet,
>
> Some my comment.
>
> For question 1.
> Yes,
> perquery_limit is introduced mainly for restrict resource usage in large
> scale cluster; perquery_perseg_limit is to avoid allocating too many
> processes in one segment, which may cause serious performance issue. So,
> two gucs are for different performance aspects. Along with the variation of
> cluster scale, one of the two limits actually takes effect. We dont have to
> let both active for resource allocation.
>
> For question 2.
>
> In fact, perquery_perseg_limit is a general resource restriction for all
> queries not only hash table queries and external table queries, this is why
> this guc is not merged with another one. For example, when we run some
> queries upon random distributed tables, it does not make sense to let
> resource manager refer a guc for hash table.
>
> For the last topic item.
>
> In my opinion, it is not necessary to adjust hawq_rm_nvseg_perquery_limit,
> say, we just need to leave it unchanged and actually not active until we
> really want to run a large-scale HAWQ cluster, for example, 100+ nodes.
>
> Best,
> Yi
>
> On Wed, Jul 13, 2016 at 1:18 PM, Vineet Goel <vvineet@apache.org> wrote:
>
> > Hi all,
> >
> > I’m trying to document some GUC usage in detail and have questions on
> > hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit
> > tuning.
> >
> > *hawq_rm_nvseg_perquery_limit* = (default value = 512) . Let’s call it
> > *perquery_limit* in short.
> > *hawq_rm_nvseg_perquery_perseg_limit* (default value = 6) . Let’s call it
> > *perquery_perseg_limit* in short.
> >
> >
> > 1) Is there ever any benefit in having perquery_limit *greater than*
> > (perquery_perseg_limit * segment host count) ?
> > For example in a 10-node cluster, HAWQ will never allocate more than (GUC
> > default 6 * 10 =) 60 v-segs, so the perquery_limit default of 512 doesn’t
> > have any effect. It seems perquery_limit overrides (takes effect)
> > perquery_perseg_limit only when it’s value is less than
> > (perquery_perseg_limit * segment host count).
> >
> > Is that the correct assumption? That would make sense, as users may want
> to
> > keep a check on how much processing a single query can take up (that
> > implies that the limit must be lower than the total possible v-segs). Or,
> > it may make sense in large clusters (100-nodes or more) where we need to
> > limit the pressure on HDFS.
> >
> >
> > 2) Now, if the purpose of hawq_rm_nvseg_perquery_limit is to keep a check
> > on single query resource usage (by limiting the # of v-segs), doesn’t if
> > affect default_hash_table_bucket_number because queries will fail when
> > *default_hash_table_bucket_number* is greater than
> > hawq_rm_nvseg_perquery_limit ? In that case, the purpose of
> > hawq_rm_nvseg_perquery_limit conflicts with the ability to run queries on
> > HASH dist tables. This then means that tuning
> hawq_rm_nvseg_perquery_limit
> > down is not a good idea, which seems conflicting to the purpose of the
> GUC
> > (in relation to other GUC).
> >
> >
> > Perhaps someone can provide some examples on *how and when would you
> > tune hawq_rm_nvseg_perquery_limit* in this 10-node example:
> >
> > *Defaults on a 10-node cluster are:*
> > a) *hawq_rm_nvseg_perquery_perseg_limit* = 6 (hence ability to spin up 6
> *
> > 10 = 60 total v-segs for random tables)
> > b) *hawq_rm_nvseg_perquery_limit* = 512 (but HAWQ will never dispatch
> more
> > than 60 v-segs on random table, so value of 512 does not seem practical)
> > c) *default_hash_table_bucket_number* = 60 (6 * 10)
> >
> >
> >
> > Thanks
> > Vineet
> >
>



-- 
Thanks

Hubert Zhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message