kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Billy Liu <billy...@apache.org>
Subject Re: Cube tunning
Date Fri, 09 Dec 2016 06:11:48 GMT
You are correct. Put dimensions which are used together into one aggr group
is right. Suppose you have dimension A, B and C. If A and B in aggr group
1, and B and C in group 2, then Kylin will pre-compute AB, BC, but without
AC. That means if you query by AB, the result will respond quickly, but if
query by AC, Kylin will post-aggregate the result by merging other cuboids,
that will slow down the query.

You could define multiple hierarchy in different aggr groups,  most time,
they have the same performance result as defining them into one aggr
groups. They are the rules telling Kylin how to combine the dimensions.

2016-12-09 13:33 GMT+08:00 peter zhang <peter.zhang9211@gmail.com>:

> Billy, thanks very much for you quick reply.
>
> In my case, I logical split Aggregation-Groups by dimension category. As
> you can check out in my schema JSON, all the date related columns in a
> group, all the payment way related dimensions in another group and all the
> other junk  dimensions that are not used frequently in a group defined as
> joint group and so on( There are 6 groups in my setting)...
> As my understanding of your explanation,* is this more reasonable that
> put the dimensions that are often use in one query in a same group? For
> example, I often query payment way by day, then payment way dimension and
> date dimension should put in a same group.*
>
> Another big question, there can be multiple hierarchies / Joint Dimensions
> in one group. Why is there exists multiple aggregation groups? I another
> words, *we can define multi hierarchy dimensions in one group rather than
> create multi group.*
>
> 2016-12-09 12:32 GMT+08:00 Billy Liu <billyliu@apache.org>:
>
>> Suppose you have N dimensions, and all in one agg group, then the total
>> cuboid will be 2^N.
>> But if you split N into N1, N2, N3, which N1+N2+N3>=N, then the total
>> cuboid will be 2^N1+2^N2+2^N3.
>> You will figure out how improvement this could be.
>>
>> How to split the agg groups depends on how your query would be. Maybe you
>> could share with us what kinds of query it is.
>>
>> 2016-12-09 11:44 GMT+08:00 peter zhang <peter.zhang9211@gmail.com>:
>>
>>> I build a cube. First time, without any tuning and no aggregation group
>>> setting, cube size is about 20G.
>>> Then refer tuning document, I add some aggregation group, cube is
>>> deduced to 71.02 MB. Unfortunately, query performance is also worse
>>> than before, most of the query latency is about more than 10 seconds.
>>> I don't know what the different between add all columns in one new agg
>>> group and split columns group into different group.  In my practice, I
>>> created 6 agg-group.
>>>
>>> Any guys can help check my json schema
>>>
>>> Thanks in advance.
>>>
>>>
>>
>

Mime
View raw message