kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From big data <bigdatab...@outlook.com>
Subject Re: How Kylin use bitmap to calculte the cardinality of non-integer field?
Date Tue, 19 Jul 2016 06:01:18 GMT
Thank you ,Sun. I'm still downloading the code, so I first browse the 
articles about Kylin dictionary, still some open questions about it:

1. This 
article(http://kylin.apache.org/blog/2015/08/13/kylin-dictionary/) 
describes the Trie structure for the dictionary, but I didn't catch the 
generation of Seq No. in the Trie example. How dictonary generate the 
seq no for each coming string?

2. If the string field is user id or device id with millions of (even 
billiions of) UUID, the Trie will have fixed height (same length of 
UUID, such as 32 bytes), so the dictionay will be too huge.  Does Kylin 
still calculate the accurate cardinality value? or approprete value? And 
How Kylin can keep the query performance for the huge one?

Thanks.



在 16/7/19 上午11:01, Yerui Sun 写道:
> Generally speaking, we used dictionary to encode non-integer values, and mapping the
dict id into bitmap to count.
>
> In some details, original dictionary in Kylin is at segment level, which means that one
same value in different segments may have different dict id, made the result wrong when count
values across segments.
> We’ve introduced GlobalDictionary to solve this problem. Global Dict is at cube level,
making sure one value has one stable dict id, no matter the value shows up in which or how
many segments. The Global Dict is append-able, to support incremental cube building, and it’s
also splittable with LRU cache, to reduce the memory cost, with huge dataset supporting, such
as 500M etc.
>
> The code have been merge into master branch and will be released in v1.5.3, you can check
it out.
>
> Any comment or discussion is welcome.
>
> Thanks.
>
>> 在 2016年7月18日,15:41,big data <bigdatabase@outlook.com> 写道:
>>
>> I heard the Kylin support non-integer field by using bitmap index.
>>
>> I just want to know how Kylin indexes the string field, and mapping each
>> item to bitmap?
>>
>> Thanks.
> .
>

Mime
View raw message