kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Kang-Sen Lu <...@anovadata.com>
Subject question about how kylin chooses a specific cube design over others
Date Mon, 07 Jan 2019 13:05:39 GMT
I am using kylin 2.5.1. I have a data model and two cube designs on top of that data model.

One data model is used to perform aggregation over a set of aggregation groups. So the metric
are all using "COUNT DISTINCT", and "SUM" functions. To speed up TOPN application, I have
created another smaller cube design which addresses the TOPN application specifically, i.e.
the metric contains only "COUNT DISTINCT", and "TOPN", but no "SUM" aggregation.

Here is a normal query, and somehow I am surprised that the TOPN cube was sleected. That caused
over 10,000,000 row of data being returned and failed.

>From the kylin.log, you can see the "SELECT" statement and the cost evaluation data. I
am not sure what has caused the wrongly selection of the cube. I hope someone can provide
me with some hint or references.


2019-01-07 07:55:46,137 INFO  [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] service.QueryService:387
: The original query:  SELECT  (COUNT(DISTINCT ZETTICSDW.A_MA_HOURLY_V.SUBSCRIBER_ID)), (SUM(ZETTICSDW.A_MA_HOURLY_V.HITS)),
(SUM(ZETTICSDW.A_MA_HOURLY_V.PAGE_VIEWS)), (SUM(ZETTICSDW.A_MA_HOURLY_V.SESSIONS)), (SUM(ZETTICSDW.A_MA_HOURLY_V.SESSION_TIME)),
(SUM(ZETTICSDW.A_MA_HOURLY_V.DOWN_BYTES)), (SUM(ZETTICSDW.A_MA_HOURLY_V.DATA_CONSUMED)), (SUM(ZETTICSDW.A_MA_HOURLY_V.UP_BYTES))
 FROM  ZETTICSDW.A_MA_HOURLY_V   WHERE  ((ZETTICSDW.A_MA_HOURLY_V.THEDATE >= '20180501')
AND (ZETTICSDW.A_MA_HOURLY_V.THEDATE <= '20180501'))

2019-01-07 07:55:46,184 INFO  [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:58
: Find candidates by table ZETTICSDW.A_MA_HOURLY_V and project=Anovadata : CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]
2019-01-07 07:55:46,185 INFO  [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule,
realizations before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], realizations
after: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]]
2019-01-07 07:55:46,185 INFO  [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
realizations before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], realizations
after: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]]
2019-01-07 07:55:46,185 INFO  [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] rules.RealizationSortRule:40
: CUBE[name=ma_aggs_cube_6] priority 1 cost 279. CUBE[name=ma_aggs_topn_cube] priority 1 cost
105.
2019-01-07 07:55:46,186 INFO  [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations
before: [CUBE[name=ma_aggs_cube_6],CUBE[name=ma_aggs_topn_cube]], realizations after: [CUBE[name=ma_aggs_topn_cube],CUBE[name=ma_aggs_cube_6]]
2019-01-07 07:55:46,186 INFO  [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:95
: Adjust DimensionAsMeasure for FunctionDesc [expression=COUNT_DISTINCT, parameter=ZETTICSDW.A_MA_HOURLY_V.SUBSCRIBER_ID,
returnType=null]
2019-01-07 07:55:46,186 INFO  [Query 06ddbf4a-8cef-1515-463e-e1067eaaae3a-134709] routing.QueryRouter:75
: The realizations remaining: [CUBE[name=ma_aggs_topn_cube],CUBE[name=ma_aggs_cube_6]],and
the final chosen one for current olap context 0 is CUBE[name=ma_aggs_topn_cube]

Thanks.

Kang-sen

Mime
View raw message