kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Redistribute intermediate table default not by rand()
Date Fri, 02 Nov 2018 06:06:59 GMT
Hi Zhixin,

Kylin 2.5.1 will add some tips in the advanced step, hope that can help.

liuzhixin <liuzx32@163.com> 于2018年11月2日周五 下午2:05写道:

> Hi Chao Long:
>
> Thank you for the answer.
> #
> Maybe kylin should provide config for every build step
>
> Best wishes.
>
> > 在 2018年11月2日,下午1:38,Chao Long <wayne.l@qq.com> 写道:
> >
> > Hi zhixin,
> > Data may become not correct if use "distribute by rand()".
> > https://issues.apache.org/jira/browse/KYLIN-3388
> >
> >
> >
> >
> > ------------------ 原始邮件 ------------------
> > 发件人: "liuzhixin"<liuzx32@163.com>;
> > 发送时间: 2018年11月2日(星期五) 中午12:53
> > 收件人: "dev"<dev@kylin.apache.org>;
> > 抄送: "ShaoFeng Shi"<shaofengshi@apache.org>;
> > 主题: Re: Redistribute intermediate table default not by rand()
> >
> >
> >
> > Hi kylin team:
> >
> > Step: Redistribute intermediate table
> > #
> > 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE
BY RAND()
> > 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
> >
> > Best Regards!
> >
> >> 在 2018年11月2日,下午12:03,liuzhixin <liuzx32@163.com> 写道:
> >>
> >> Hi kylin team:
> >>
> >> Version: Kylin2.5-hadoop3.1 for hdp3.0
> >> #
> >> Step: Redistribute intermediate table
> >> #
> >> DISTRIBUTE BY is that:
> >> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM
> table_intermediate DISTRIBUTE BY Field1, Field2, Field3;
> >> #
> >> Not DISTRIBUTE BY RAND()
> >> #
> >> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE
> BY RAND()?
> >>
> >> Best wishes.
>
>
>

-- 
Best regards,

Shaofeng Shi 史少锋
Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message