kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Chitre <chitre.a...@gmail.com>
Subject Re: New document: "How to optimize cube build"
Date Thu, 09 Feb 2017 01:20:30 GMT
My question was a general question. Not any specific issue that I am
encountering -:)

I understand that we can prune by using Hierarchical dimensions,
aggregation groups etc. But what if these types of aggregations are not
possible.

Let's say I've 15 dimensions (& I can't prune any), would Kylin build
32,766 Cuboids or is there a property to say... "If no. of dimensions are
over X, stop building more Cuboids. Get from the base"? (Knowing this will
slow down the queries).

Please let me know. Thanks.


On Mon, Feb 6, 2017 at 5:43 AM, ShaoFeng Shi <shaofengshi@gmail.com> wrote:

> Ajay, thanks for your feedback;
>
> For question 1, the code has been merged in master branch; next release
> would be 2.0; a beta release will be published soon.
>
> For question 2, yes your understanding is correct: a N dim FULL cube will
> have 2^N - 1 cuboids; but if you adopted some way like hierarchy, joint or
> separating dimensions to multi groups, it will be a "partial" cube which
> means some cuboids will be pruned.
>
> If a query uses dimensions across aggregation groups, then only the base
> cuboid can fulfill it, kylin has to do the post aggregation from the base
> cuboid, the performance would be downgraded. Please check whether it's this
> case in your side.
>
> Get Outlook for iOS <https://aka.ms/o0ukef>
>
>
>
>
> On Mon, Feb 6, 2017 at 2:05 PM +0900, "Ajay Chitre" <chitre.ajay@gmail.com
> > wrote:
>
> Thanks for writing this document. It's very helpful. I've following
>> questions:
>>
>> 1) Doc says... "Kylin will build dictionaries in memory (in next version
>> this will be moved to MR)".
>>
>> Which version can we expect this in? For large Cubes this process takes a
>> long time on local machine. We really need to move this to the Hadoop
>> cluster. In fact, it will be great if we can have an option to run this
>> under Spark -:)
>>
>> 2) About the "Build N-Dimension Cuboid" step.
>>
>> Does Kylin build ALL Cuboids? My understanding is:
>>
>> Total no. of Cuboids = (2 to the power of # of dimensions) - 1
>>
>> Correct?
>>
>> So if there are 7 dimensions, there will be 127 Cuboids, right? Does
>> Kylin create ALL of them?
>>
>> I was under the impression that, after some point, Kylin will just get
>> measures from the Base Cuboid; instead of building all of them. Please
>> explain.
>>
>> Thanks.
>>
>>
>>
>> On Sat, Feb 4, 2017 at 2:19 AM, Li Yang <liyang@apache.org> wrote:
>>
>>> Be free to update the document with different opinions. :-)
>>>
>>> On Thu, Jan 26, 2017 at 11:34 AM, ShaoFeng Shi <shaofengshi@apache.org>
>>> wrote:
>>>
>>>> Hi Alberto,
>>>>
>>>> Thanks for your comments! In many cases the data is imported to Hadoop
>>>> in T+1 mode. Especially when everyday's data is tens of GB, it is
>>>> reasonable to partition the Hive table by date. The problem is whether it
>>>> worth to keep a long history data in Hive; Usually user only keep a couple
>>>> monthes' data in Hive; If the partition number exceeds the threshold in
>>>> Hive, he/she can remove the oldest partitions or move to another table
>>>> easily; That is a common practice of Hive I think, and it is very good to
>>>> know that Hive 2.0 will solve this.
>>>>
>>>> 2017-01-25 17:10 GMT+08:00 Alberto Ramón <a.ramonportoles@gmail.com>:
>>>>
>>>>> Be careful about partition by "FLIGHTDATE"
>>>>>
>>>>> From https://github.com/albertoRamon/Kylin/tree/master/KylinPerfo
>>>>> rmance
>>>>>
>>>>> *"Option 1: Use id_date as partition column on Hive table. This have
a
>>>>> big problem: the Hive metastore is meant for few hundred of partitions
not
>>>>> thousand (Hive 9452 there is an idea to solve this isn’t in progress)*
>>>>> "
>>>>>
>>>>> In Hive 2.0 will be a preview (only for testing) to solve this
>>>>>
>>>>> 2017-01-25 9:46 GMT+01:00 ShaoFeng Shi <shaofengshi@apache.org>:
>>>>>
>>>>>> Hello,
>>>>>>
>>>>>> A new document is added for the practices of cube build. Any
>>>>>> suggestion or comment is welcomed. We can update the doc later with
>>>>>> feedbacks;
>>>>>>
>>>>>> Here is the link:
>>>>>> https://kylin.apache.org/docs16/howto/howto_optimize_build.html
>>>>>>
>>>>>> --
>>>>>> Best regards,
>>>>>>
>>>>>> Shaofeng Shi 史少锋
>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>>
>>

Mime
View raw message