kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "潘博存" <pa...@teld.cn>
Subject 回复:维度表数据发生变化后,sql查询出来的数据还是历史数据
Date Fri, 12 Oct 2018 10:01:36 GMT
是的,维度表应该是公共的,使用构件的规则来避免此类问题的发生


------------------------------------------------------------------
发件人:xixin <zxxbuaa@163.com>
发送时间:2018年10月12日(星期五) 16:17
收件人:user <user@kylin.apache.org>; 潘博存 <panbc@teld.cn>
主 题:Re: 维度表数据发生变化后,sql查询出来的数据还是历史数据


Hi
你的问题总结来说就是:两个数据不一致的Cube同时「满足」了sql的查询,不同Cube给出的结果自然不同。
目前应该是这样:Kylin对于一个sql,如何选择Cube(选择哪个Cube)对用户应该是透明的。实际使用中应该注意这种情况的出现。
个人理解仅供参考


Best regards,
Zhao Xixin 


在 2018年10月11日,19:28,潘博存 <panbc@teld.cn> 写道:
   1.kylin 的版本为2.3.1
   2.通过看日志发现一个问题,
     日常 我们对于维度表,不同的cube是公用一个维度表的,分析了一下,不知道下面的推断是否正确?

    1.场景:2个cube 同时用的相同的一个cube,这个时候 hive中的维度表数据发生了变化,kylin中的一个cube
进行了新数据的build, 但是另外一个cube 没有进行build,这个时候 使用kylin
查询维度表,数据
      会发生错乱?
    2.针对我们生产环境现象是
       模型:ChargeBillActiveProtectAnalysisCube 和模型:CHARGEBILL_V3  同时用了
维度表 :电站,但是CHARGEBILL_V3 基于新的业务数据进行了build,但是ChargeBillActiveProtectAnalysisCube
没有进行build。

      select * from V_KMSSTATION_V3 where name like '山东%'    ---这个sql 从日志看
使用了 cube:CHARGEBILL_V3   --得到了新的结果
      select name from V_KMSSTATION_V3 where name like '山东%'  --这个sql 从日志看
使用了cube:ChargeBillActiveProtectAnalysisCube  --得到了旧的数据

  疑问:这种公用维度表的cube,单独使用kylin sql 进行查询时, 使用哪一个cube
是如何判断的呢?不能公用维度表了? 应该如何处理这种场景呢?

     相关的明细日志如下:
      --执行sql:select * from V_KMSSTATION_V3 where name like '山东%'  --查询的结果是新数据
 执行日志如下:
2018-10-11 11:10:40,589 DEBUG [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] util.CheckUtil:35
: query cache disabled in KylinConfig
2018-10-11 11:10:40,589 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:428
: Using project: TELD
2018-10-11 11:10:40,589 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:429
: The original query:  select * from V_KMSSTATION_V3 where name like '山东%'
2018-10-11 11:10:40,591 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:646
: The corrected query: select * from V_KMSSTATION_V3 where name like '山东%'
LIMIT 50000
2018-10-11 11:10:40,598 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:58
: Find candidates by table DEFAULT.V_KMSSTATION_V3 and project=TELD : CUBE[name=CHARGEBILL_V3]
2018-10-11 11:10:40,598 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule,
realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after: [CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
realizations before: [CUBE[name=CHARGEBILL_V3]], realizations after: [CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] rules.RealizationSortRule:40
: CUBE[name=CHARGEBILL_V3] priority 1 cost 1447. 
2018-10-11 11:10:40,599 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations
before: [CUBE[name=CHARGEBILL_V3]], realizations after: [CUBE[name=CHARGEBILL_V3]]
2018-10-11 11:10:40,599 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] routing.QueryRouter:75
: The realizations remaining: [CUBE[name=CHARGEBILL_V3]],and the final chosen one for current
olap context 0 is CUBE[name=CHARGEBILL_V3]
2018-10-11 11:10:41,311 INFO  [Scheduler 1381016263 Job cd97b823-4d7f-44cd-a156-6d88d63794d8-445]
hive.CreateFlatHiveTableStep:38 : Map 1: 0/42 Map 11: 0/2     Map 12: 0/1     Map 13: 0/1
    Map 14: 0/1     Map 15: 0/1     Map 16: 0/1     Map 17: 0/1    Map 18: 0/1      Map 19:
0/1     Map 20: 0/1     Map 21: 0/1     Map 22: 0/1     Map 23: 0/1     Map 24: 0/1     Map
3: 0/8      Map 4: 0/1      Map 5: 0/5      Map 7: 0/5      Map 9: 0/5      Reducer 10: 0/2
Reducer 6: 0/2  Reducer 8: 0/2
2018-10-11 11:10:41,535 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:1068
: Processed rows for each storageContext: 0 
2018-10-11 11:10:41,536 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:510
: Stats of SQL response: isException: false, duration: 948, total scan count 0
2018-10-11 11:10:41,536 DEBUG [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] util.CheckUtil:35
: query cache is disabled
2018-10-11 11:10:41,536 INFO  [Query 472468f6-303d-4086-addf-9145cf0985a8-679403] service.QueryService:328
: 
==========================[QUERY]===============================
Query Id: 472468f6-303d-4086-addf-9145cf0985a8
SQL: select * from V_KMSSTATION_V3 where name like '山东%'
User: ADMIN
Success: true
Duration: 0.948
Project: TELD
Realization Names: [CUBE[name=CHARGEBILL_V3]]
Cuboid Ids: []
Total scan count: 0
Total scan bytes: 0
Result row count: 1
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Is Query Push-Down: false
Is Prepare: false
Trace URL: null
Message: null
==========================[QUERY]===============================



第2个sql:
   -执行sql:select name from V_KMSSTATION_V3 where name like '山东%'  --查询的结果是旧数据

--执行日志如下(奇怪的是这2个查询 日志上显示的cube不一样,
我们在定义时 ChargeBillActiveProtectAnalysisCube 和CHARGEBILL_V3 是公用的)
2018-10-11 11:13:48,547 DEBUG [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] util.CheckUtil:35
: query cache disabled in KylinConfig
2018-10-11 11:13:48,547 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:428
: Using project: TELD
2018-10-11 11:13:48,547 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:429
: The original query:  select name from V_KMSSTATION_V3 where name like '山东%'
2018-10-11 11:13:48,549 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:646
: The corrected query: select name from V_KMSSTATION_V3 where name like '山东%'
LIMIT 50000
2018-10-11 11:13:48,556 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:58
: Find candidates by table DEFAULT.V_KMSSTATION_V3 and project=TELD : CUBE[name=ChargeBillActiveProtectAnalysisCube]
2018-10-11 11:13:48,556 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RemoveBlackoutRealizationsRule,
realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], realizations after:
[CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RemoveUncapableRealizationsRule,
realizations before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], realizations after:
[CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] rules.RealizationSortRule:40
: CUBE[name=ChargeBillActiveProtectAnalysisCube] priority 1 cost 422. 
2018-10-11 11:13:48,557 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:51
: Applying rule: class org.apache.kylin.query.routing.rules.RealizationSortRule, realizations
before: [CUBE[name=ChargeBillActiveProtectAnalysisCube]], realizations after: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
2018-10-11 11:13:48,557 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] routing.QueryRouter:75
: The realizations remaining: [CUBE[name=ChargeBillActiveProtectAnalysisCube]],and the final
chosen one for current olap context 0 is CUBE[name=ChargeBillActiveProtectAnalysisCube]
2018-10-11 11:13:49,288 INFO  [Scheduler 1381016263 Job 0ad7c900-c8d9-4af5-bac0-58db06af4a67-378]
hive.CreateFlatHiveTableStep:38 : Map 1: 30(+41)/96   Map 11: 2/2     Map 12: 1/1     Map
13: 1/1     Map 14: 1/1     Map 15: 1/1     Map 16: 1/1     Map 17: 1/1     Map 18: 1/1  
  Map 19: 1/1     Map 20: 1/1     Map 21: 1/1     Map 22: 1/1     Map 23: 1/1     Map 24:
1/1     Map 3: 72(+2)/74        Map 4: 1/1      Map 5: 5/5      Map 7: 5/5      Map 9: 5/5
     Reducer 10: 2/2 Reducer 6: 2/2  Reducer 8: 2/2
2018-10-11 11:13:49,557 INFO  [Scheduler 1381016263 Job 0ad7c900-c8d9-4af5-bac0-58db06af4a67-378]
hive.CreateFlatHiveTableStep:38 : Map 1: 30(+42)/96   Map 11: 2/2     Map 12: 1/1     Map
13: 1/1     Map 14: 1/1     Map 15: 1/1     Map 16: 1/1     Map 17: 1/1     Map 18: 1/1  
  Map 19: 1/1     Map 20: 1/1     Map 21: 1/1     Map 22: 1/1     Map 23: 1/1     Map 24:
1/1     Map 3: 72(+2)/74        Map 4: 1/1      Map 5: 5/5      Map 7: 5/5      Map 9: 5/5
     Reducer 10: 2/2 Reducer 6: 2/2  Reducer 8: 2/2
2018-10-11 11:13:49,597 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:1068
: Processed rows for each storageContext: 0 
2018-10-11 11:13:49,597 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:510
: Stats of SQL response: isException: false, duration: 1050, total scan count 0
2018-10-11 11:13:49,597 DEBUG [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] util.CheckUtil:35
: query cache is disabled
2018-10-11 11:13:49,598 INFO  [Query 32b23403-77d4-40d5-9743-ce2b7f31628a-949958] service.QueryService:328
: 
==========================[QUERY]===============================
Query Id: 32b23403-77d4-40d5-9743-ce2b7f31628a
SQL: select name from V_KMSSTATION_V3 where name like '山东%'
User: ADMIN
Success: true
Duration: 1.051
Project: TELD
Realization Names: [CUBE[name=ChargeBillActiveProtectAnalysisCube]]
Cuboid Ids: []
Total scan count: 0
Total scan bytes: 0
Result row count: 1
Accept Partial: true
Is Partial Result: false
Hit Exception Cache: false
Storage cache used: false
Is Query Push-Down: false
Is Prepare: false
Trace URL: null
Message: null
==========================[QUERY]===============================





------------------------------------------------------------------
发件人:Na Zhai <na.zhai@kyligence.io>
发送时间:2018年10月11日(星期四) 17:38
收件人:user@kylin.apache.org <user@kylin.apache.org>; 潘博存 <panbc@teld.cn>
主 题:答复: 维度表数据发生变化后,sql查询出来的数据还是历史数据

Hi, bocun
What's your Kylin version? Can you provide the log information?
发送自 Windows 10 版邮件应用
 
发件人: 潘博存 <panbc@teld.cn>
发送时间: Thursday, October 11, 2018 3:16:50 PM
收件人: user
抄送: 冯志山
主题: 维度表数据发生变化后,sql查询出来的数据还是历史数据
今天,线上环境,发现一个问题,感觉很奇怪,详请教一下具体原因,以及如何避免以下现象的出现,感谢!

1.基于事实表A和维度表B创建了Cube c,其中 维度表B的字段有 id,code,name.
事实表A通过 字段BID 与B中的id字段进行关联 --B的name 字段是衍生列
2.可以使用kylin正常构件,每天正常构件。
 维度表有这样一条记录
  id   code   name
  001  001     山东大厦
3.一段时间之后,维度表中的上述数据发生了变化。

 id   code   name
 001  001    山东新闻大厦

 这个时候再次进行构件
4.在kylin中进行查询(kylin 中会生成事实表和维度表)
 select * from B   --查询结果   
  id  code  name
  001  001  山东新闻大厦

  select name from B --查询结果
   name
   山东大厦
 问题:
   为什么通过 select name from B  出来的数据是维度发生变化之前的数据呢,但是通过select
* from B 出来的数据 是新数据
Mime
View raw message