hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Lei Chang <lei_ch...@apache.org>
Subject Re: Question on hawq_rm_nvseg_perquery_limit
Date Wed, 13 Jul 2016 08:00:17 GMT
On Wed, Jul 13, 2016 at 3:16 PM, Vineet Goel <vvineet@apache.org> wrote:

> This leads me to another question on Apache Ambari UI integration.
>
> It seems the need to tune hawq_rm_nvseg_perquery_limit is minimal, as we
> seem to prescribe a limit of 512 regardless of cluster size. If that's the
> case, two options come to mind:
>
> 1) Either the "default" hawq_rm_nvseg_perquery_limit should be the lower
> value between (6 * segment host count) and 512. This way, it's less
> confusing to users and there is a logic behind the value.
>

If ambari uses the lower value, it is difficult to change
hawq_rm_nvseg_perquery_perseg_limit anymore.

for example, it we want to change hawq_rm_nvseg_perquery_perseg_limit to 8
for better performance on lower concurrency workload, it is doable anymore.


>
> 2) Or, the parameter should not be exposed on the UI, leaving the default
> to 512. When/why would a user want to change this value?
>

I think this is an advanced configuration and only used by some cases, not
exposed is fine, but i think we need a way to change it.

If users want to increase the max value of degree of parallelism, users
should change this. For example, if end user workload has just some simple
to scale queries, on a large cluster, it is fine to tune the value.


>
> Thoughts?
>
> Vineet
>
>
> On Tue, Jul 12, 2016 at 11:51 PM, Hubert Zhang <hzhang@pivotal.io> wrote:
>
> > +1 with Yi's answer.
> > Vseg numbers are controlled by Resource Negotiator(a module before
> > planner),  all the vseg related GUCs will affect the behaviour of RN,
> some
> > of them will also affect Resource Manager.
> > To be specific, hawq_rm_nvseg_perquery_limit and
> > hawq_rm_nvseg_perquery_perseg_limit
> > are both considered by Resource Negotiator(RN) and Resource Manager(RM),
> > while default_hash_table_bucket_number is only considered by RN.
> > As a result, suppose default_hash_table_bucket_number  = 60, query like
> > "select * from hash_table" will request #60 vsegs in RN and if
> > hawq_rm_nvseg_perquery_limit
> > is less than 60, RM will not able to allocate 60 vsegs.
> >
> > So we need to ensure default_hash_table_bucket_number is less than the
> > capacity of RM.
> >
> > On Wed, Jul 13, 2016 at 1:40 PM, Yi Jin <yjin@pivotal.io> wrote:
> >
> > > Hi Vineet,
> > >
> > > Some my comment.
> > >
> > > For question 1.
> > > Yes,
> > > perquery_limit is introduced mainly for restrict resource usage in
> large
> > > scale cluster; perquery_perseg_limit is to avoid allocating too many
> > > processes in one segment, which may cause serious performance issue.
> So,
> > > two gucs are for different performance aspects. Along with the
> variation
> > of
> > > cluster scale, one of the two limits actually takes effect. We dont
> have
> > to
> > > let both active for resource allocation.
> > >
> > > For question 2.
> > >
> > > In fact, perquery_perseg_limit is a general resource restriction for
> all
> > > queries not only hash table queries and external table queries, this is
> > why
> > > this guc is not merged with another one. For example, when we run some
> > > queries upon random distributed tables, it does not make sense to let
> > > resource manager refer a guc for hash table.
> > >
> > > For the last topic item.
> > >
> > > In my opinion, it is not necessary to adjust
> > hawq_rm_nvseg_perquery_limit,
> > > say, we just need to leave it unchanged and actually not active until
> we
> > > really want to run a large-scale HAWQ cluster, for example, 100+ nodes.
> > >
> > > Best,
> > > Yi
> > >
> > > On Wed, Jul 13, 2016 at 1:18 PM, Vineet Goel <vvineet@apache.org>
> wrote:
> > >
> > > > Hi all,
> > > >
> > > > I’m trying to document some GUC usage in detail and have questions on
> > > > hawq_rm_nvseg_perquery_limit and hawq_rm_nvseg_perquery_perseg_limit
> > > > tuning.
> > > >
> > > > *hawq_rm_nvseg_perquery_limit* = (default value = 512) . Let’s call
> it
> > > > *perquery_limit* in short.
> > > > *hawq_rm_nvseg_perquery_perseg_limit* (default value = 6) . Let’s
> call
> > it
> > > > *perquery_perseg_limit* in short.
> > > >
> > > >
> > > > 1) Is there ever any benefit in having perquery_limit *greater than*
> > > > (perquery_perseg_limit * segment host count) ?
> > > > For example in a 10-node cluster, HAWQ will never allocate more than
> > (GUC
> > > > default 6 * 10 =) 60 v-segs, so the perquery_limit default of 512
> > doesn’t
> > > > have any effect. It seems perquery_limit overrides (takes effect)
> > > > perquery_perseg_limit only when it’s value is less than
> > > > (perquery_perseg_limit * segment host count).
> > > >
> > > > Is that the correct assumption? That would make sense, as users may
> > want
> > > to
> > > > keep a check on how much processing a single query can take up (that
> > > > implies that the limit must be lower than the total possible v-segs).
> > Or,
> > > > it may make sense in large clusters (100-nodes or more) where we need
> > to
> > > > limit the pressure on HDFS.
> > > >
> > > >
> > > > 2) Now, if the purpose of hawq_rm_nvseg_perquery_limit is to keep a
> > check
> > > > on single query resource usage (by limiting the # of v-segs), doesn’t
> > if
> > > > affect default_hash_table_bucket_number because queries will fail
> when
> > > > *default_hash_table_bucket_number* is greater than
> > > > hawq_rm_nvseg_perquery_limit ? In that case, the purpose of
> > > > hawq_rm_nvseg_perquery_limit conflicts with the ability to run
> queries
> > on
> > > > HASH dist tables. This then means that tuning
> > > hawq_rm_nvseg_perquery_limit
> > > > down is not a good idea, which seems conflicting to the purpose of
> the
> > > GUC
> > > > (in relation to other GUC).
> > > >
> > > >
> > > > Perhaps someone can provide some examples on *how and when would you
> > > > tune hawq_rm_nvseg_perquery_limit* in this 10-node example:
> > > >
> > > > *Defaults on a 10-node cluster are:*
> > > > a) *hawq_rm_nvseg_perquery_perseg_limit* = 6 (hence ability to spin
> up
> > 6
> > > *
> > > > 10 = 60 total v-segs for random tables)
> > > > b) *hawq_rm_nvseg_perquery_limit* = 512 (but HAWQ will never dispatch
> > > more
> > > > than 60 v-segs on random table, so value of 512 does not seem
> > practical)
> > > > c) *default_hash_table_bucket_number* = 60 (6 * 10)
> > > >
> > > >
> > > >
> > > > Thanks
> > > > Vineet
> > > >
> > >
> >
> >
> >
> > --
> > Thanks
> >
> > Hubert Zhang
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message