kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Yaqian Zhang <>
Subject Re: Details about “Extract Fact Table Distinct Columns and Build Dimension Dictionary”
Date Sun, 01 Sep 2019 08:02:45 GMT
Hi Johnson:
	In this step, kylin calculates the cardinality of the dimension column and builds a dictionary
for the dimension column.
	In order to save space and improve efficiency, kylin encodes and compresses dimensions, and
adopts dictionary coding technology by default. Dictionary encoding is to construct a mapping
table from string to int for all the values under the dimension, and then serialize the dictionary
to save, thus greatly reducing the size of the storage. The dictionary is in order. If string
A is bigger than string B, the value of encoding A will be bigger than that of encoding B.
This will enable the encoding value to be used in Hbase queries without decoding.
	However, since using dictionary encoding requires maintaining a mapping table, it is necessary
to consider the dimension cardinality, which refers to the number of all the different values
in the dimension column. If the cardinality of the dimension is very high, the dictionary
will be very large, so it is not suitable for loading into memory. In this case, other encoding
methods should be chosen. The maximum allowable limit for kylin dictionary coding is 5 million
by default, which is configured by parameter kylin.dictionary.max.cardinality.

> On Aug 30, 2019, at 8:29 PM, Johnson <> wrote:
> Hi,all:
> I want to know the details of these two steps:Extract Fact Table Distinct Columns and
Build Dimension Dictionary。What do these steps do and how to do?
> looking forward to your reply
> ----------------------
> Best wishes,
> Johnson

View raw message