kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <>
Subject Re: Precisely Count Distinct cause spark data skew
Date Fri, 14 Dec 2018 00:20:58 GMT
Hi Ming,

Firstly, are you using Global dictionary for these two columns (ORDER_ID,
CUSTOMER_ID)? If the columns is integer type, the dictionary is not needed.

Secondly, in Kylin 2.5, there is an improvement for building the cube with
global dictionary, see KYLIN-3491. Please consider to upgrade;

Thirdly, for those complex measures like count distinct and top n,
MapReduce is more stable than Spark. If above 2 couldn't solve it, try to
switch the engine.

Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email:
Kyligence Inc:

Apache Kylin FAQ:
Join Kylin user mail group:
Join Kylin dev mail group:

|明の <> 于2018年12月10日周一 下午5:47写道:

> Hi all~
>       It's my first time to ask questions,nice to meet you !
>         I have builded my cube with Precisely Count Distinct ,then the
> spark data skew happened .In addition ,the spark input data was vary
> large !
>         Why is that ,how can I fix it?
>                                  Thanks all~

View raw message