hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yi Jin <y...@pivotal.io>
Subject Re: hawq_rm_nvseg_perquery_perseg_limit
Date Tue, 06 Dec 2016 23:01:24 GMT
I checked the code for describing table using commands \d and \d+. I think
in current version we cannot get the bucket number through just \d command.
Maybe we can get a chance to improve this in future. This should be easy to
add because describe uses sql statements to query expected information.


On Wed, Dec 7, 2016 at 8:20 AM, Jon Roberts <jroberts@pivotal.io> wrote:

> Oh, I missed that there is default_hash_table_bucket_number which is
> calculated at init time to be 6 * number of nodes which is separate from
> hawq_rm_nvseg_perquery_perseg_limit for random and external tables.  So it
> is already separate.
>
> Any chance we can get the bucketnum from gp_distribution_policy displayed
> in psql when a user uses "\d" to define the table?
>
>
>
> Jon Roberts
> Principal Engineer | jroberts@pivotal.io | 615-426-8661
>
> On Tue, Dec 6, 2016 at 2:53 PM, Yi Jin <yjin@pivotal.io> wrote:
>
> > Hi Jon,
> >
> > I think to me it is a good news that we can increase
> > hawq_rm_nvseg_perquery_perseg_limit to improve performance when
> accessing
> > randomly distributed table. I think this limit is just a upper limit for
> > random table. In my opinion, it is not active when considering a hash
> > distributed table, and it is not considered when deciding the bucket
> number
> > of a hash table.
> >
> > So, even setting 24 as high as you mentioned, I think hash table always
> > follows its bucket number to acquire virtual segments. I think Hubert (
> > hzhang@pivotal.io) can provide you more information how to decide bucket
> > number of a hash distributed table and how to decide number of virtual
> > segments for a query accessing mixed distributed tables.
> >
> > I want to mention another case that we have tight resource or busy
> > workload, query for random distributed table will not get stable number
> of
> > virtual segments as a hash table, even when
> hawq_rm_nvseg_perquery_perseg_
> > limit
> > is set as high as 24.
> >
> > Best,
> > Yi
> >
> > On Wed, Dec 7, 2016 at 4:49 AM, Jon Roberts <jroberts@pivotal.io> wrote:
> >
> > > I've been testing TPC-DS queries and found that I can get Randomly
> > > Distributed tables to outperform Hash Distributed tables by increasing
> > > hawq_rm_nvseg_perquery_perseg_limit on a per query basis to as high as
> > 24.
> > >
> > > For Hash Distributed tables, 24 is way too high.  It is also not a
> great
> > > idea to make the default so high in case users are creating a mix of
> > Random
> > > and Hash Distributed Tables.
> > >
> > > Would it be possible to make this one GUC separated into two so that
> you
> > > can leave it 6 for Hash Distributed tables but another value like 16
> for
> > > Randomly Distributed tables?
> > >
> > > This enhancement would also make it possible for later improvements in
> > the
> > > optimizer to determine how many vsegs to use.  For example, some
> queries
> > > worked best set to 12 while others greatly benefited when set to 24.
> > >
> > >
> > > Jon Roberts
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message