kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ajay Chitre <chitre.a...@gmail.com>
Subject Re: New document: "How to optimize cube build"
Date Mon, 06 Feb 2017 05:05:25 GMT
Thanks for writing this document. It's very helpful. I've following
questions:

1) Doc says... "Kylin will build dictionaries in memory (in next version
this will be moved to MR)".

Which version can we expect this in? For large Cubes this process takes a
long time on local machine. We really need to move this to the Hadoop
cluster. In fact, it will be great if we can have an option to run this
under Spark -:)

2) About the "Build N-Dimension Cuboid" step.

Does Kylin build ALL Cuboids? My understanding is:

Total no. of Cuboids = (2 to the power of # of dimensions) - 1

Correct?

So if there are 7 dimensions, there will be 127 Cuboids, right? Does Kylin
create ALL of them?

I was under the impression that, after some point, Kylin will just get
measures from the Base Cuboid; instead of building all of them. Please
explain.

Thanks.



On Sat, Feb 4, 2017 at 2:19 AM, Li Yang <liyang@apache.org> wrote:

> Be free to update the document with different opinions. :-)
>
> On Thu, Jan 26, 2017 at 11:34 AM, ShaoFeng Shi <shaofengshi@apache.org>
> wrote:
>
>> Hi Alberto,
>>
>> Thanks for your comments! In many cases the data is imported to Hadoop in
>> T+1 mode. Especially when everyday's data is tens of GB, it is
>> reasonable to partition the Hive table by date. The problem is whether it
>> worth to keep a long history data in Hive; Usually user only keep a couple
>> monthes' data in Hive; If the partition number exceeds the threshold in
>> Hive, he/she can remove the oldest partitions or move to another table
>> easily; That is a common practice of Hive I think, and it is very good to
>> know that Hive 2.0 will solve this.
>>
>> 2017-01-25 17:10 GMT+08:00 Alberto Ramón <a.ramonportoles@gmail.com>:
>>
>>> Be careful about partition by "FLIGHTDATE"
>>>
>>> From https://github.com/albertoRamon/Kylin/tree/master/KylinPerformance
>>>
>>> *"Option 1: Use id_date as partition column on Hive table. This have a
>>> big problem: the Hive metastore is meant for few hundred of partitions not
>>> thousand (Hive 9452 there is an idea to solve this isn’t in progress)*"
>>>
>>> In Hive 2.0 will be a preview (only for testing) to solve this
>>>
>>> 2017-01-25 9:46 GMT+01:00 ShaoFeng Shi <shaofengshi@apache.org>:
>>>
>>>> Hello,
>>>>
>>>> A new document is added for the practices of cube build. Any suggestion
>>>> or comment is welcomed. We can update the doc later with feedbacks;
>>>>
>>>> Here is the link:
>>>> https://kylin.apache.org/docs16/howto/howto_optimize_build.html
>>>>
>>>> --
>>>> Best regards,
>>>>
>>>> Shaofeng Shi 史少锋
>>>>
>>>>
>>>
>>
>>
>> --
>> Best regards,
>>
>> Shaofeng Shi 史少锋
>>
>>
>

Mime
View raw message