kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From yuzhang <shifengdefan...@163.com>
Subject Re: question related to the aggregation groups configuration
Date Thu, 21 Mar 2019 01:34:50 GMT
Hi Kang-sen:
     I read your email carely and think can share some information with you.
    1. You can use cube planner to view the generated cuboid and relative dimension combination.
Here is the doc http://kylin.apache.org/docs/tutorial/use_cube_planner.html
    2.  the number of all combination of D1-to-D10 is 2^10, not factorial(10) I think, according
the blog I sent you before. Did I misunderstand you?
    3. I think we can apply those three rule independently. Because I have found those code
snapshot in AggregationGroup.java. If we don't define either mandatory or hierarchy or joint,
the code just return, which don't influence other defined rules. I have test it just now,
it's work.
    4. According to the DefaultCuboidScheduler.java:340, if we set 'kylin.cube.aggrgroup.is-mandatory-only-valid',
kylin would generate cuboids contain manadatory dims. But according to kylin's configuration
document(kylin.cube.aggrgroup.is-mandatory-only-valid: whether to allow Cube contains only
Base Cuboid. The default value is FALSE, set to TRUE when using Spark Cubing), the doc seems
misleading.
Please correct me kindly if something is wrong.


Best regards 
yuzhang
| |
yuzhang
|
|
shifengdefannao@163.com
|
签名由网易邮箱大师定制
On 3/20/2019 23:31,Lu, Kang-Sen<klu@rbbn.com> wrote:

Hi, Yuzhang:

 

I found out the reason why if includes = {D1, … , D10}, and mandatory = {D1, …, D9}, then
we only get one cuboid as {D1, … D10}, and {D1, …, D9} is NOT generated.

 

It is caused by this code in “core-common/src/main/java/org/apache/kylin/common/KylinConfigBase.java”:

 

    public boolean getCubeAggrGroupIsMandatoryOnlyValid() {

        return Boolean.parseBoolean(getOptional("kylin.cube.aggrgroup.is-mandatory-only-valid",
"false"));

    }

 

In kylin.properties, we did not config this parameter “kylin.cube.aggrgroup.is-mandatory-only-valid"
as “true”, and by default, it is set to “false”. So {D1, …, D9} is so-called “mandatory-only”,
and treated as not valid.

 

Kang-sen

 

 

From: Lu, Kang-Sen <klu@rbbn.com>
Sent: Wednesday, March 20, 2019 8:32 AM
To:user@kylin.apache.org; dev@kylin.apache.org
Subject: RE: question related to the aggregation groups configuration

 

NOTICE: This email was received from an EXTERNAL sender

 

Hi, Yuzhang:

 

Thank you for taking the time to respond. I did read this requirement for “mandatory dimension”:
("if a dimension is specified as “mandatory”, then all of the combinations without such
dimension can be pruned"). That is the key point information.

BTW: I am curious if there is an easy way to find out how many cuboids are generated by kylin
and every cuboid’s dimension set from kylin’s metadata. Your finding is what I suspected.
But I am not able to verify it as you did.

 

We can live with this fact as is, if it is documented. But it would be better to fix the bug
and allow the original description of mandatory stand as correct.

 

About Q2, I read the link you mentioned, it seems if hierarchy and joint are both specified,
then the joint is being treated as tag-alone restriction, say D2 is in hierarchy and became
“mandatory” in cuboids, if joint says D2 and D3 must be together, then D2 will pull D3
into the “mandatory” list. That is elegant.

 

I am wondering why these three selection-rules can NOT be applied independently. If we have
D1-to-D10 in the includes set. Then the number of all combination of D1-to-D10 is factorial(10).
Now, we can apply “mandatory” to “prune” some of the combination out. After that,
we may further prune by applying the hierarchy and joint rules. Isn’t it possible?

 

Thanks.

 

Kang-sen

 

From: yuzhang <shifengdefannao@163.com>
Sent: Wednesday, March 20, 2019 1:00 AM
To:user@kylin.apache.org; dev@kylin.apache.org
Subject: Re: question related to the aggregation groups configuration

 

NOTICE: This email was received from an EXTERNAL sender

 

Hi kang-sen:

    I do some test about Q1, {D1 to D10} have been included in an aggregation group and {D1
to D9} have been added into mandatory dimension. Then kylin only generates Cuboid {D1 to D10}(base
Cuboid) which I expect {D1 to D10} and {D1 to D9}. When I add {D1 to D8} in to mandatory dimension,
kylin generates Cuboid {D1 to D10}, {D1 to D8, D9} and {D1 to D8, D10} which I expect {D1
to D10}, {D1 to D8, D9}, {D1 to D8, D10} and {D1 to D8}. About your Q1, I think the answer
is ONLY ONE cuboid {D1 to D10} has been generated. But according the blog ("if a dimension
is specified as “mandatory”, then all of the combinations without such dimension can be
pruned"), the Cuboid {D1 to D9} should't been pruned. Maybe someone else can give more detail.

    Q2 is similar with this email https://lists.apache.org/thread.html/3ccc8d7f98748d7c590c01c7da6ce666a16c4fe2b34be070940cae8f@%3Cuser.kylin.apache.org%3E
and jira https://issues.apache.org/jira/browse/KYLIN-2149 . Now kylin will prevent config
overlapping hierachy, mandatory and joint. Although the minds of three aggregation rule are
different and even contradictory, auto merging those rules into Cuboids is feasible. For now,
the restriction of aggregation group can't realize your requirement which I think is common.
May be the jira KYLIN-2149 can be resolved in the future.

 

                                                       Best regards

                                                            yuzhang

 

 

|

|

yuzhang

|
|

shifengdefannao@163.com

|

签名由 网易邮箱大师 定制

On 3/19/2019 23:09,Lu, Kang-Sen<klu@rbbn.com> wrote:

Hi, Yuzhang:

 

I would appreciate if you can provide answer to my 2 questions.

 

Thanks.

 

Kang-sen

 

From: Lu, Kang-Sen <klu@rbbn.com>
Sent: Friday, March 15, 2019 8:15 AM
To:user@kylin.apache.org
Subject: RE: question related to the aggregation groups configuration

 

NOTICE: This email was received from an EXTERNAL sender

 

Hi, Yuzhang:

 

Thanks for taking time to reply.

 

I actually have read that article several times earlier before.

 

However, may be I missed some details or what, I am not clear about how those rules actual
work and how they interfere with each other.

 

From the article you pointed out, the hierarchy rule does have an example, so it is less likely
to be confused.

 

I did not find any discussion about the “mandatory rule”. It is supposed to be very simple,
but I am stuck by the details. Let’s say, “includes” is a set of dim: { d1, d2, …
d10}, and the “mandatory” is a set of dim: {D1, …, D9}.

So it is obvious that each cuboid generated from this agg group should all include set of
dim {D1, …, D9}.

Now, D10 could be either selected or not. So the natural guess is that this agg group will
generate two cuboids, i.e {D1,…,D9} and {D1,… D10}. Is this what kylin will do?

 

Another detail I am not clear is the interaction of “joint rule” and the “mandatory
rule”. It seems that there is an interaction between these two rules. I am not clear why,
and it is not discussed in the article you mentioned.

 

That was my two original questions.

 

Thanks again.

 

Kang-sen

 

From: yuzhang <shifengdefannao@163.com>
Sent: Friday, March 15, 2019 7:46 AM
To:user@kylin.apache.org
Subject: Re: question related to the aggregation groups configuration

 

NOTICE: This email was received from an EXTERNAL sender

 

Hi kang-sen:

  Here is a blog about the mind of aggregation group. I hope it will help you.

https://kylin.apache.org/blog/2016/02/18/new-aggregation-group/

 

Best regards

 yuzhang

 

|

|

yuzhang

|
|

shifengdefannao@163.com

|

签名由 网易邮箱大师 定制

On 3/14/2019 21:21,Lu, Kang-Sen<klu@rbbn.com> wrote:

I am running kylin 2.5.1

 

I have two questions related to the aggregation group configuration. In the kylin GUI, select
“Model”, then try to edit a cube design, under “Grid”, select “Advanced Setting”,
we can enter multiple “Aggregation Groups”. Each “Aggregation Group” can specify zero,
one, or many cuboids, with the combination of dimensions.

 

Q1: If I want one and only one cuboid to be created with dimensions set = {D1, D2, … , D10},
then is it correct to enter D1-to-D10 in the “includes” list, and “D1-to-D9 in the “Mandatory
Dimensions” list? The key question is “will kylin generate two cuboids, i.e. {D1, …,
D9} and {D1, … , D10} or just one cuboid”?

 

Q2: If I entered D1-to-D10 into the “includes” list, and entered {D1, D2} in the “Joint
Dimensions” list, then I can’t enter either D1 or D2 into the “Mandatory Dimensions”
list? I was thinking if I entered {D1, D3, … , D9} in the “Mandatory Dimensions”, and
with {D1, D2} in the “Joint Dimensions”, then there should only one cuboid generated for
{D1, D2, …, D10}. Why is it not allowed?

 

Maybe the doc have this information described. But it is not clear to me exactly how does
kylin process the info entered in the “includes”, “Mandatory Dimensions”, and “Joint
Dimensions”. Can someone either point me to some document or answer the questions I mentioned
above.

 

Thanks.

 

Kang-sen

 

 

Notice: This e-mail together with any attachments may contain information of Ribbon Communications
Inc. that is confidential and/or proprietary for the sole use of the intended recipient. Any
review, disclosure, reliance or distribution by others or forwarding without express permission
is strictly prohibited. If you are not the intended recipient, please notify the sender immediately
and then delete all copies, including any attachments.
Mime
View raw message