kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Understanding aggregation group settings
Date Tue, 11 Jul 2017 14:58:09 GMT
hi Stefan,

You question is too long; today I can just answer your first question:

Q:  "We have understood the concept behind derived dimensions and
hierarchies, but we cannot figure out how to implement it with the given
UI."

Answer:
1) The "derived dimension" can only be from lookup tables, because kylin
will take snapshot for lookup table, so with the snapshot in memory, kylin
will be able to do a "deriving" operation from PK/FK to the other columns
on lookup tables. Okay this is the background, now when you add a dimension
from a lookup table, you will see there is an option letting you select
"derived" or "normal". If you select "derived", it will put the FK column
as Cube's physical dimension and remember using it to map this dimension at
runtime. If you select "normal", then this column will be a physical
dimension.

2) The hierarchy is a relationship among several dimensions. You need
define the relationship in the "advanced settings" step; In a aggregation
group, putting them into a "hierarchy" and in the sequence from parent to
children. For example: country, city, street.

You can check the sample cube that Kylin ships as an example.


2017-07-07 16:59 GMT+08:00 Hüls, Stefan <stefan.huels@tui.com>:

> Hi there,
>
>
>
> we are currently investigating Kylin to help us bring our OLAP environment
> to hadoop.
>
>
>
> We have a very traditional datamart approach modelled in our DWH, but
> struggle to understand how to model cubes in Kylin.
>
> We have understood the concept behind derived dimensions and hierarchies,
> but we cannot figure out how to implement it with the given UI.
>
> I hope you can give us some hints or insights on how to do an optimal cube
> build.
>
>
>
> We have one fact table (F) which has measures and one ID attribute per
> dimension table. The IDs are the FK for the dimension tables.
>
> Like in the "Optimize Cube Design" article, this is a typical example:
>
>
>
> Fact                                           Dimensions
>
> measure1, measure2, ..., measureX, FK1, FK2, FK3 <-inner join-> PK1
> H1                      (D_TIME_DIMENSION)
>
>                                                  <-inner join-> PK2 H1,
> H2, H3, H4, H5, H6  (D_GEOGRAPHY)
>
>                                                  <-inner join-> PK2 H1,
> H2, H3, H4          (D_DATE)
>
> where PK has a 1 to 1 relationship to H1 in every dimension.
>
>
>
> 1.) How to model the dimension for D_TIME_DIMENSION. Is it PK as normal
> and H1 as derived dimension, or do we ignore PK1 and just model H1 as
> normal dimension?
>
> 2.) How to model the dimension for D_GEOGRAPHY. Are H1 to H6 normal
> dimensions and should be modelled as a hierarchy later, or is any dimension
> derived?
>
> 3.) The same question arises for D_DATE.
>
>
>
> The next questions arise regarding the aggregation groups. Since in your
> description, every table attribute that is used in the cube model is called
> "dimension", the
>
> question comes up which of these dimensions from the above example should
> be included in aggregation groups and how.
>
>
>
> I would understand that if we do not define an aggregation group, we would
> have 2^11 cuboids to be created (if every Hx dimension is "normal").
>
> Assume that for lookup table D_GEOGRAPHY, the H1 dimension has a high
> cardinality and H2 to H6 are subgroups of H1 (aka. levels in a geo tree)
>
> H1 of D_TIME_DIMENSION is mandatory in all analyses.
>
> H1 of D_DATE is a classic date field, with H2 to H4 being WEEK, MONTH and
> YEAR.
>
> How is the aggregation group UI meant to be used in this example? I
> understand that aggregation groups are a white-list to reduce the number of
> cuboids?
>
>
>
> In our current IBM Cognos Transformer Powercubes environment, we have
> models with about 30 dimension tables and a total of about 90 levels. The
> fact table has about 60 million rows.
>
> The final cube has a size of about 10 GB.
>
> This would result in about 2^30 to 2^90 cuboids in Kylin and is impossible
> to produce. Do you know what kind of algorithm is used to produce the IBM
> Powercubes?
>
> Access time is not very good on a lower aggregation layer, but the top
> level aggregations are fast. The software is about 15 years old and his is
> own issues, but
>
> building cubes with massive dimension numbers works fine as long as there
> are not too many rows in the fact table and the filesize of a single file
>
> does not exceed 2GB at any time.
>
>
>
> We would like to somehow model these kind of aggregations, but we think we
> would have to think different to achieve this.
>
> We currently need about 4 hours on a 2 core machine with 8GB RAM to create
> these Cognos cubes. How is this even possible and what does Kylin do
> different.
>
>
>
> We are currently using Kylin 2.0 on a Cloudera 5.11 cluster.
>
>
>
> Any help would be much appreciated.
>
> Best regards,
>
> Stefan
>
>
>



-- 
Best regards,

Shaofeng Shi 史少锋

Mime
View raw message