kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Precisely Count Distinct cause spark data skew
Date Fri, 14 Dec 2018 00:20:58 GMT
Hi Ming,

Firstly, are you using Global dictionary for these two columns (ORDER_ID,
CUSTOMER_ID)? If the columns is integer type, the dictionary is not needed.

Secondly, in Kylin 2.5, there is an improvement for building the cube with
global dictionary, see KYLIN-3491. Please consider to upgrade;

Thirdly, for those complex measures like count distinct and top n,
MapReduce is more stable than Spark. If above 2 couldn't solve it, try to
switch the engine.


Best regards,

Shaofeng Shi 史少锋
Apache Kylin PMC
Work email: shaofeng.shi@kyligence.io
Kyligence Inc: https://kyligence.io/

Apache Kylin FAQ: https://kylin.apache.org/docs/gettingstarted/faq.html
Join Kylin user mail group: user-subscribe@kylin.apache.org
Join Kylin dev mail group: dev-subscribe@kylin.apache.org




|明の <15816188899@qq.com> 于2018年12月10日周一 下午5:47写道:

> Hi all~
>       It's my first time to ask questions,nice to meet you !
>
>         I have builded my cube with Precisely Count Distinct ,then the
> spark data skew happened .In addition ,the spark input data was vary
> large !
>
>         Why is that ,how can I fix it?
>
>
>                                  Thanks all~
>
>
>
>

Mime
View raw message