hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Jin <y...@pivotal.io>
Subject Re: hawq_rm_stmt_vseg_memory and hawq_rm_nvseg_perquery_perseg_limit
Date Mon, 23 Jan 2017 04:05:58 GMT
Hi Jon,

For scenario 1, it is true. However, do you think it necessary to make one
vseg having so high memory quota setting? Of course, I think if we force
hawq_rm_stmt_nvseg a value different with bucket number, optimizer will
give different plan. This is way to force high concurrency and high
resource occupation.

For scenario 2, I think if we want to release more concurrency in executing
one query, we should advice customer to
increase hawq_rm_nvseg_perquery_limit (default 512)
and hawq_rm_nvseg_perquery_perseg_limit(default 6), then we choose an
appropriate vseg mem quota, as a result HAWQ will make decision to leverage
more concurrency and more resource for big queries automatically.  I mean
HAWQ will choose the number of virtual segments automatically, not always
high number of virtual segments.

For scenario 3, It is recommended not to assign all system memory to HAWQ
in the segment side. It is a good idea to leave some for system and for the
other applications.

I think it is fine to support higher memory limit for one virtual segment,
but still I have no idea why it is necessary.

As HAWQ uses different memory allocation strategy comparing with Greenplum,
it is true, the external behavior of HAWQ is different. Virtual segment
(VSEG) is a critical concept in HAWQ to manage resource consumption and
query QE concurrency, in fact for Greenplum, the QE concurrency is fixed,
i.e. the number of QEs for one query execution is fixed, this is why
Greenplum user just needs to think of memory consumption in one segment, as
all segments consume the same amount of memory for the same query. While
back to HAWQ, this number is dynamic, you can easily change resource queue
definition or statement resource definition to change it, you don't have to
install and config more segments in those nodes to force the QE
concurrency.

Consider a scenario that there are a lot of small scale queries
concurrently arrived, HAWQ just allocate a small number of vsegs for each
query without resource waste and this makes HAWQ able to support high
concurrency much better than traditional MPP-architecture database to
achieve high throughput. Opposite, for those large scale queries, HAWQ can
automatically increase number of vseg to a very high number if you set
those limit gucs high as well.

Best,
Yi

On Mon, Jan 23, 2017 at 12:30 PM, Jon Roberts <jroberts@pivotal.io> wrote:

> I've been thinking about these scenarios:
>
> 1.  Hash distributed tables with fixed number of buckets.
> If the tables were built using the defaults, buckets = 6 * number of
> nodes.  So you basically have 6 vsegs per host.  Multiply that by 16GB and
> you only can use 96GB of the 256GB of RAM per node.
>
> 2.  A user has random tables but doesn't understand they can increase the
> number of vsegs.
> This will be common for users that come from Greenplum.  They again can
> only set statement member to 16GB so they are stuck with a max of 96GB of
> RAM usage.
>
> 3.  User increases vsegs and statement memory.
> Possibly run out of memory if too aggressive with settings.
>
> - I think we should be able to specify statement memory higher than 16GB.
> Maybe the limit should be something much higher such as 1TB.
>
> - The optimizer should limit the number of vsegs based on statement memory
> setting to prevent OOM.  You could do the opposite too.  (limit memory and
> use the vseg setting provided)  Greenplum can limit the amount of memory
> but we have two dials to adjust with vsegs and memory.
>
>
> Jon Roberts
>
> On Sun, Jan 22, 2017 at 5:20 PM, Yi Jin <yjin@pivotal.io> wrote:
>
> > Hi Jon,
> >
> > That guc setting limit means for one virtual segment, the maximum
> > consumable memory is 16GB, for one segment/node, there maybe multiple
> vsegs
> > allocated to run queries, so if a node has 256gb expected to be consumed
> by
> > HAWQ, it will have at most 16 vsegs running concurrently.
> >
> > hawq_rm_stmt_vseg_memory is for setting statement level vseg memory
> > consumption, it is required to specify hawq_rm_stmt_nvseg as well, only
> > when hawq_rm_stmt_nvseg is greater than 0, hawq_rm_stmt_vseg_memory is
> > activated regardless the original target resource queue vseg resource
> quota
> > definition. For example, you can set hawq_rm_stmt_vseg_memory as 16gb,
> > hawq_rm_stmt_nvseg
> > as 256, if you have a cluster having 256gb * 16 nodes and your target
> > resource queue can use 100% cluster resource, you will have 16 vsegs
> > running per node to consume all memory resource for this query.
> >
> > Best,
> > Yi
> >
> > On Sat, Jan 21, 2017 at 3:40 PM, Lei Chang <lei_chang@apache.org> wrote:
> >
> > > hawq_rm_stmt_vseg_memory and hawq_rm_stmt_nvseg need to be used
> together
> > to
> > > set the specific number of segments and the vseg memory. And
> > > hawq_rm_stmt_nvseg should be less than hawq_rm_nvseg_perquery_perseg_
> > > limit.
> > >
> > > set hawq_rm_stmt_vseg_memory = '2GB';set hawq_rm_stmt_nvseg = 6;
> > >
> > > looks 16GB is somewhat small for big dedicated machines: if 16GB is per
> > > virtual segment memory, if 8 segment is used, it only use 128GB.
> > >
> > > Cheers
> > > Lei
> > >
> > >
> > > On Fri, Jan 20, 2017 at 9:11 PM, Jon Roberts <jroberts@pivotal.io>
> > wrote:
> > >
> > > > Why is there a limit of 16GB for hawq_rm_stmt_vseg_memory?  A cluster
> > > with
> > > > 256GB per node and dedicated for HAWQ may certainly want to utilize
> > more
> > > > memory per segment.  Is there something I'm missing regarding
> statement
> > > > memory?
> > > >
> > > > Secondly, does the number of vsegs for a query get influenced by the
> > > > statement memory or does it just look at the plan and
> > > > hawq_rm_nvseg_perquery_perseg_limit?
> > > >
> > > >
> > > > Jon Roberts
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message