kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Roberto Tardío Olmos <>
Subject Aggregation Groups and High Cardinality Dimension
Date Mon, 24 Apr 2017 11:14:18 GMT
Hi Kylin Community,

I have some doubts about how aggregation groups works and HCD dimensions 
good practices:

1. I have created a cube with two aggregation groups. In the first Agg 
Group I have included Time and Company dimensions. In the second AGG 
Group I have included Customer Dimension, with a cardinality about 1 
million of rows. This dimension is used with less frequency than the 
dimension at the first agg group and filtering (some IDs) is always 
applied to it.

After cube build, I can execute queries that combine Dimensions of the 
two aggregations groups if a need. However, the query latency is quite 
poor than when I define the three dimensions together at the same Agg 
Group. I guess that is due to aggregation occurs during execution, 
because are no precalculated like when the three dimensions are in the 
same AGG group.

How the two aggregations groups are combined at query execution? I 
suppose that the FK reference of any fact involved in the query result 
is stored and known by two AGG groups. I would like to know more detail 
about how this works.

2. If a have an HCD dimension that is little used in queries and always 
applied Customer ID filtering to get data only for some customers. ¿Is 
is a good practice to define it in a separated AGG group?


*Roberto Tardío Olmos*
/Senior Big Data & Business Intelligence Consultant/

Avenida de Brasil, 17, Planta 16.

28020 Madrid

Fijo: 91.788.34.10

View raw message