kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Details about “Extract Fact Table Distinct Columns and Build Dimension Dictionary”
Date Mon, 02 Sep 2019 02:55:25 GMT
This article can help, to some extend:

https://kylin.apache.org/docs/howto/howto_optimize_build.html

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Email: shaofengshi@apache.org

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




ITzhangqiang <ITzhangqiang@163.com> 于2019年9月2日周一 上午10:23写道:

> Hi Yaqian:
>
>        Thanks fro your reply!
>
> I know what you said,but I want to know more detail.
>
>
>
> 发送自 Windows 10 版邮件 <https://go.microsoft.com/fwlink/?LinkId=550986>应用
>
>
>
> *发件人: *Yaqian Zhang <Yaqian_Zhang@126.com>
> *发送时间: *2019年9月1日 16:03
> *收件人: *user@kylin.apache.org
> *主题: *Re: Details about “Extract Fact Table Distinct Columns and Build
> Dimension Dictionary”
>
>
>
> Hi Johnson:
>
>        In this step, kylin calculates the cardinality of the dimension
> column and builds a dictionary for the dimension column.
>
>        In order to save space and improve efficiency, kylin encodes and
> compresses dimensions, and adopts dictionary coding technology by default.
> Dictionary encoding is to construct a mapping table from string to int for
> all the values under the dimension, and then serialize the dictionary to
> save, thus greatly reducing the size of the storage. The dictionary is in
> order. If string A is bigger than string B, the value of encoding A will be
> bigger than that of encoding B. This will enable the encoding value to be
> used in Hbase queries without decoding.
>
>        However, since using dictionary encoding requires maintaining a
> mapping table, it is necessary to consider the dimension cardinality, which
> refers to the number of all the different values in the dimension column.
> If the cardinality of the dimension is very high, the dictionary will be
> very large, so it is not suitable for loading into memory. In this case,
> other encoding methods should be chosen. The maximum allowable limit for
> kylin dictionary coding is 5 million by default, which is configured by
> parameter kylin.dictionary.max.cardinality.
>
>
>
> On Aug 30, 2019, at 8:29 PM, Johnson <itzhangqiang@163.com> wrote:
>
>
>
> Hi,all:
>
> ·         I want to know the details of these two steps:Extract Fact
> Table Distinct Columns and Build Dimension Dictionary。What do these steps
> do and how to do?
>
> ·         looking forward to your reply
>
>
>
> ----------------------
>
> Best wishes,
>
> Johnson
>
>
>
>
>
>
>
>
>
>
>
>
>
>

Mime
View raw message