kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Xiaoxiang Yu <xiaoxiang...@kyligence.io>
Subject Re: 答复: cube关联后的distinct问题
Date Mon, 29 Oct 2018 09:38:30 GMT
Hello gaoyang,



In your case,  you have two fact tables which both has one column "user_id", these two columns
both have duplicate values so you cannot use one of these fact tables as a lookup table. You
want get the distinct count of user_id which occur in both two sides.



If what I understand is correct, I guess you should enable query pushdown to let other sql
engine to do such task.

----------------
Best wishes,
Xiaoxiang Yu



发件人: 高扬02 <gaoyang02@bianfeng.com>
答复: "user@kylin.apache.org" <user@kylin.apache.org>
日期: 2018年10月29日 星期一 16:50
收件人: "user@kylin.apache.org" <user@kylin.apache.org>
主题: 答复: cube关联后的distinct问题

Hi Xiaoxiang,
Thank you for your reply.
So you mean , we cannot use Kylin to deal with the intersection of two sets from two fact
tables , is that true ?
If that is true , I cannot use “join” between two fact tables in two cubes.
And I cannot create a snowflake schema with the tow fact tables , because they have multi-multi
records after “join” , and the Exception is ” Dup key found” .
Please give me a suggestion , how to get the intersection of two sets from  two fact tables.
Thanks a lot.
Best regards.
Yang Gao

发件人: Xiaoxiang Yu [mailto:xiaoxiang.yu@kyligence.io]
发送时间: 2018年10月29日 15:44
收件人: user@kylin.apache.org
主题: Re: cube关联后的distinct问题
重要性: 高

Hi, kylin is a OLAP engine, it support star schema and snowflake schema. Both star schema
and snowflake schema support only one fact table. If these two tables cannnot be found in
one model, it cannot be executed on kylin.
As FAQ says [http://kylin.apache.org/docs/gettingstarted/faq.html]


Is Kylin a generic SQL engine for big data?

  *   No, Kylin is an OLAP engine with SQL interface. The SQL queries need be matched with
the pre-defined OLAP model.


----------------
Xiaoxiang Yu


发件人: 高扬02 <gaoyang02@bianfeng.com<mailto:gaoyang02@bianfeng.com>>
答复: "user@kylin.apache.org<mailto:user@kylin.apache.org>" <user@kylin.apache.org<mailto:user@kylin.apache.org>>
日期: 2018年10月25日 星期四 15:10
收件人: "user@kylin.apache.org<mailto:user@kylin.apache.org>" <user@kylin.apache.org<mailto:user@kylin.apache.org>>
主题: cube关联后的distinct问题

请教各位:

我需要关联两个fact表,所以目前是分别构建两个cube,要关联的字段为user_id,我将它设置同时为dimension和measure,以便可以取得明细(dimension)和计算count_distinct(measure),且不用开启query-push-down
但两个子查询分别命中了对应的cuboid,可是最外层去重查询报错,未知原因?

sql如下:
select count(distinct a.r_id) from
(select r.user_id as r_id,l.user_id as l_id from
(select user_id from register where data_day='2018-09-01' and app_id='403_20170811' )r
inner join
(select user_id from login where data_day='2018-09-01' and app_id='403_20170811')l
on l.user_id=r.user_id
)a;

报错:
Can't create EnumerableAggregate! while executing SQL: "select count(distinct a.r_bmid) from
(select r.bmid_user as r_bmid,l.bmid_user as l_bmid from (select bmid_user from wl_register_bm_mes
where data_day='2018-09-01' and app_id='403_20170811' )r inner join (select bmid_user from
wl_login_new_mes where data_day='2018-10-14' and app_id='403_20170811')l on l.bmid_user=r.bmid_user
)a LIMIT 50000"

期待解答。thx!
Mime
View raw message