kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Long" <wayn...@qq.com>
Subject 回复: Redistribute intermediate table default not by rand()
Date Fri, 02 Nov 2018 05:38:11 GMT
Hi zhixin,
 Data may become not correct if use "distribute by rand()".
 https://issues.apache.org/jira/browse/KYLIN-3388




------------------ 原始邮件 ------------------
发件人: "liuzhixin"<liuzx32@163.com>;
发送时间: 2018年11月2日(星期五) 中午12:53
收件人: "dev"<dev@kylin.apache.org>;
抄送: "ShaoFeng Shi"<shaofengshi@apache.org>; 
主题: Re: Redistribute intermediate table default not by rand()



Hi kylin team:

Step: Redistribute intermediate table
#
默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE
BY RAND()
如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。

Best Regards!

> 在 2018年11月2日,下午12:03,liuzhixin <liuzx32@163.com> 写道:
> 
> Hi kylin team:
> 
> Version: Kylin2.5-hadoop3.1 for hdp3.0
> #
> Step: Redistribute intermediate table
> #
> DISTRIBUTE BY is that:
> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate DISTRIBUTE
BY Field1, Field2, Field3;
> #
> Not DISTRIBUTE BY RAND()
> #
> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY RAND()?
> 
> Best wishes.
>
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message