hawq-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Hubert Zhang <hzh...@pivotal.io>
Subject Re: hawq_rm_nvseg_perquery_perseg_limit
Date Wed, 07 Dec 2016 02:45:51 GMT
Hi Jon,
    You mentioned by increasing GUC hawq_rm_nvseg_perquery_perseg_limit
to12 or 24, performance would get improved. It's true because we use more
processes/resources to finish the query.  But improvement is not linear
because of process init/communication cost. Moreover the total resource is
limited, if one query use too many resource, it  would reduce the
concurrency of system. The reason of why to set this GUC to 6 also consider
the balance between performance and concurrency.


On Wed, Dec 7, 2016 at 7:01 AM, Yi Jin <yjin@pivotal.io> wrote:

> I checked the code for describing table using commands \d and \d+. I think
> in current version we cannot get the bucket number through just \d command.
> Maybe we can get a chance to improve this in future. This should be easy to
> add because describe uses sql statements to query expected information.
>
>
> On Wed, Dec 7, 2016 at 8:20 AM, Jon Roberts <jroberts@pivotal.io> wrote:
>
>> Oh, I missed that there is default_hash_table_bucket_number which is
>> calculated at init time to be 6 * number of nodes which is separate from
>> hawq_rm_nvseg_perquery_perseg_limit for random and external tables.  So
>> it
>> is already separate.
>>
>> Any chance we can get the bucketnum from gp_distribution_policy displayed
>> in psql when a user uses "\d" to define the table?
>>
>>
>>
>> Jon Roberts
>> Principal Engineer | jroberts@pivotal.io | 615-426-8661
>>
>> On Tue, Dec 6, 2016 at 2:53 PM, Yi Jin <yjin@pivotal.io> wrote:
>>
>> > Hi Jon,
>> >
>> > I think to me it is a good news that we can increase
>> > hawq_rm_nvseg_perquery_perseg_limit to improve performance when
>> accessing
>> > randomly distributed table. I think this limit is just a upper limit for
>> > random table. In my opinion, it is not active when considering a hash
>> > distributed table, and it is not considered when deciding the bucket
>> number
>> > of a hash table.
>> >
>> > So, even setting 24 as high as you mentioned, I think hash table always
>> > follows its bucket number to acquire virtual segments. I think Hubert (
>> > hzhang@pivotal.io) can provide you more information how to decide
>> bucket
>> > number of a hash distributed table and how to decide number of virtual
>> > segments for a query accessing mixed distributed tables.
>> >
>> > I want to mention another case that we have tight resource or busy
>> > workload, query for random distributed table will not get stable number
>> of
>> > virtual segments as a hash table, even when
>> hawq_rm_nvseg_perquery_perseg_
>> > limit
>> > is set as high as 24.
>> >
>> > Best,
>> > Yi
>> >
>> > On Wed, Dec 7, 2016 at 4:49 AM, Jon Roberts <jroberts@pivotal.io>
>> wrote:
>> >
>> > > I've been testing TPC-DS queries and found that I can get Randomly
>> > > Distributed tables to outperform Hash Distributed tables by increasing
>> > > hawq_rm_nvseg_perquery_perseg_limit on a per query basis to as high
>> as
>> > 24.
>> > >
>> > > For Hash Distributed tables, 24 is way too high.  It is also not a
>> great
>> > > idea to make the default so high in case users are creating a mix of
>> > Random
>> > > and Hash Distributed Tables.
>> > >
>> > > Would it be possible to make this one GUC separated into two so that
>> you
>> > > can leave it 6 for Hash Distributed tables but another value like 16
>> for
>> > > Randomly Distributed tables?
>> > >
>> > > This enhancement would also make it possible for later improvements in
>> > the
>> > > optimizer to determine how many vsegs to use.  For example, some
>> queries
>> > > worked best set to 12 while others greatly benefited when set to 24.
>> > >
>> > >
>> > > Jon Roberts
>> > >
>> >
>>
>
>


-- 
Thanks

Hubert Zhang

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message