kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Zhong Yanghong (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-3491) Improve the cube building process when using global dictionary
Date Fri, 10 Aug 2018 09:05:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16575986#comment-16575986
] 

Zhong Yanghong commented on KYLIN-3491:
---------------------------------------

For a dimension with cardinality around 90M, the comparison of encoding performance is as
follows:
 * Directly using global dictionary, 165min
 * Using two steps with shrunken dictionary, 35+11=46min

> Improve the cube building process when using global dictionary
> --------------------------------------------------------------
>
>                 Key: KYLIN-3491
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3491
>             Project: Kylin
>          Issue Type: Improvement
>            Reporter: Zhong Yanghong
>            Assignee: Zhong Yanghong
>            Priority: Major
>
> By current cubing process, if the global dictionary is very large, since the raw data
records are unsorted, it's hard to encode raw values into ids for the input of bitmap due
to frequent swap of the dictionary slices. We need a refined process. The idea is as follows:
>  # for each source data block, there will be a mapper generating the distinct values
& sort them
>  # encode the sorted distinct values and generate a shrunken dict for each source data
block.
>  # when building base cuboid, use the shrunken dict for each source data block for encoding.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message