kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From hongbin ma <mahong...@apache.org>
Subject Re: How Kylin use bitmap to calculte the cardinality of non-integer field?
Date Wed, 20 Jul 2016 05:30:11 GMT
the original JIRA for global dict is
https://issues.apache.org/jira/browse/KYLIN-1705, now it's pending on GUI
part https://issues.apache.org/jira/browse/KYLIN-1904

On Tue, Jul 19, 2016 at 2:01 PM, big data <bigdatabase@outlook.com> wrote:

> Thank you ,Sun. I'm still downloading the code, so I first browse the
> articles about Kylin dictionary, still some open questions about it:
>
> 1. This
> article(http://kylin.apache.org/blog/2015/08/13/kylin-dictionary/)
> describes the Trie structure for the dictionary, but I didn't catch the
> generation of Seq No. in the Trie example. How dictonary generate the
> seq no for each coming string?
>
> 2. If the string field is user id or device id with millions of (even
> billiions of) UUID, the Trie will have fixed height (same length of
> UUID, such as 32 bytes), so the dictionay will be too huge.  Does Kylin
> still calculate the accurate cardinality value? or approprete value? And
> How Kylin can keep the query performance for the huge one?
>
> Thanks.
>
>
>
> 在 16/7/19 上午11:01, Yerui Sun 写道:
> > Generally speaking, we used dictionary to encode non-integer values, and
> mapping the dict id into bitmap to count.
> >
> > In some details, original dictionary in Kylin is at segment level, which
> means that one same value in different segments may have different dict id,
> made the result wrong when count values across segments.
> > We’ve introduced GlobalDictionary to solve this problem. Global Dict is
> at cube level, making sure one value has one stable dict id, no matter the
> value shows up in which or how many segments. The Global Dict is
> append-able, to support incremental cube building, and it’s also splittable
> with LRU cache, to reduce the memory cost, with huge dataset supporting,
> such as 500M etc.
> >
> > The code have been merge into master branch and will be released in
> v1.5.3, you can check it out.
> >
> > Any comment or discussion is welcome.
> >
> > Thanks.
> >
> >> 在 2016年7月18日,15:41,big data <bigdatabase@outlook.com> 写道:
> >>
> >> I heard the Kylin support non-integer field by using bitmap index.
> >>
> >> I just want to know how Kylin indexes the string field, and mapping each
> >> item to bitmap?
> >>
> >> Thanks.
> > .
> >
>
>


-- 
Regards,

*Bin Mahone | 马洪宾*

Mime
View raw message