kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Build Measures with count distinct high cardinality column
Date Fri, 11 Mar 2016 10:28:07 GMT
what's the precision of the two distinct counter? is the 12 minutes the
total time for building this cube?

2016-03-11 11:11 GMT+08:00 热爱大发挥 <385639827@qq.com>:

> hive table records: 1000000
> hive table size: 70MB
>
> cube info:
> normal dimension : 8 (cardinality less than 6)
> measures :  count (distinctuid),  the "uid" 's cardinality about 600000
>                    count (distinct keyword), the "keyword" 's cardinality
> about 100000
>
> cast time:12 MIN
> cube size: 765MB
>
>
>
> ------------------ 原始邮件 ------------------
> *发件人:* "ShaoFeng Shi"<shaofengshi@apache.org>;
> *发送时间:* 2016年3月11日(星期五) 上午10:48
> *收件人:* "user"<user@kylin.apache.org>;
> *主题:* Re: Build Measures with count distinct high cardinality column
>
> Which precision (error rate) you selected for this measure? "error rate <
> 1.22%" will take much more storage than "error rate < 9.75%", user need
> select proper precision depends on need.
>
> Also, when you state "cuboid size was very large and cast much time",
> please provide detail information like source data size, dimension number,
> dimension cardinality,  measure definition, your hadoop cluster capacity,
> cube expansion rate, build time etc. Otherwise we couldn't make judgement
> and give comment.
>
> 2016-03-10 23:20 GMT+08:00 热爱大发挥 <385639827@qq.com>:
>
>> In measures step, I try to count distinct cardinality column (like
>> user_id), then I found the cuboid size was very large and cast much time.
>> is deprecated count distinct with the high cardinality column???
>>
>
>
>
> --
> Best regards,
>
> Shaofeng Shi
>
>


-- 
Best regards,

Shaofeng Shi

Mime
View raw message