hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jon Roberts <jrobe...@pivotal.io>
Subject Re: hawq_rm_nvseg_perquery_perseg_limit
Date Tue, 06 Dec 2016 21:20:49 GMT
Oh, I missed that there is default_hash_table_bucket_number which is
calculated at init time to be 6 * number of nodes which is separate from
hawq_rm_nvseg_perquery_perseg_limit for random and external tables.  So it
is already separate.

Any chance we can get the bucketnum from gp_distribution_policy displayed
in psql when a user uses "\d" to define the table?



Jon Roberts
Principal Engineer | jroberts@pivotal.io | 615-426-8661

On Tue, Dec 6, 2016 at 2:53 PM, Yi Jin <yjin@pivotal.io> wrote:

> Hi Jon,
>
> I think to me it is a good news that we can increase
> hawq_rm_nvseg_perquery_perseg_limit to improve performance when accessing
> randomly distributed table. I think this limit is just a upper limit for
> random table. In my opinion, it is not active when considering a hash
> distributed table, and it is not considered when deciding the bucket number
> of a hash table.
>
> So, even setting 24 as high as you mentioned, I think hash table always
> follows its bucket number to acquire virtual segments. I think Hubert (
> hzhang@pivotal.io) can provide you more information how to decide bucket
> number of a hash distributed table and how to decide number of virtual
> segments for a query accessing mixed distributed tables.
>
> I want to mention another case that we have tight resource or busy
> workload, query for random distributed table will not get stable number of
> virtual segments as a hash table, even when hawq_rm_nvseg_perquery_perseg_
> limit
> is set as high as 24.
>
> Best,
> Yi
>
> On Wed, Dec 7, 2016 at 4:49 AM, Jon Roberts <jroberts@pivotal.io> wrote:
>
> > I've been testing TPC-DS queries and found that I can get Randomly
> > Distributed tables to outperform Hash Distributed tables by increasing
> > hawq_rm_nvseg_perquery_perseg_limit on a per query basis to as high as
> 24.
> >
> > For Hash Distributed tables, 24 is way too high.  It is also not a great
> > idea to make the default so high in case users are creating a mix of
> Random
> > and Hash Distributed Tables.
> >
> > Would it be possible to make this one GUC separated into two so that you
> > can leave it 6 for Hash Distributed tables but another value like 16 for
> > Randomly Distributed tables?
> >
> > This enhancement would also make it possible for later improvements in
> the
> > optimizer to determine how many vsegs to use.  For example, some queries
> > worked best set to 12 while others greatly benefited when set to 24.
> >
> >
> > Jon Roberts
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message