kylin-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chao Long" <wayn...@qq.com>
Subject 回复: Redistribute intermediate table default not by rand()
Date Fri, 02 Nov 2018 08:03:32 GMT
Hi zhixin,
   As I remember  If you set "shard by" column in cube design page, Kylin will use this column
as the condition of  "distribute by", rather than the first three field of rowkey.




------------------ 原始邮件 ------------------
发件人: "liuzhixin"<liuzx32@163.com>;
发送时间: 2018年11月2日(星期五) 下午3:11
收件人: "dev"<dev@kylin.apache.org>;
抄送: "Chao Long"<wayne.l@qq.com>; 
主题: Re: Redistribute intermediate table default not by rand()



Hi Chao Long,

Thank you for the answer.
#
Step1: Create Intermediate Flat Hive Table
Step2: Redistribute intermediate table
#
Perhaps, Kylin can insert one rand column in the intermediate hive table  for the next shard,
(as default).
At the same time,  Kylin should support the custom column for shard. (has provided)

Best Wishes.

> 在 2018年11月2日,下午1:38,Chao Long <wayne.l@qq.com> 写道:
> 
> Hi zhixin,
> Data may become not correct if use "distribute by rand()".
> https://issues.apache.org/jira/browse/KYLIN-3388
> 
> 
> 
> 
> ------------------ 原始邮件 ------------------
> 发件人: "liuzhixin"<liuzx32@163.com>;
> 发送时间: 2018年11月2日(星期五) 中午12:53
> 收件人: "dev"<dev@kylin.apache.org>;
> 抄送: "ShaoFeng Shi"<shaofengshi@apache.org>; 
> 主题: Re: Redistribute intermediate table default not by rand()
> 
> 
> 
> Hi kylin team:
> 
> Step: Redistribute intermediate table
> #
> 默认选择了维度的前三个字段作为DISTRIBUTE BY的依据,没有采用DISTRIBUTE
BY RAND()
> 如果没有合适的维度字段,这样的默认策略将会导致数据更加的数据不均衡。
> 
> Best Regards!
> 
>> 在 2018年11月2日,下午12:03,liuzhixin <liuzx32@163.com> 写道:
>> 
>> Hi kylin team:
>> 
>> Version: Kylin2.5-hadoop3.1 for hdp3.0
>> #
>> Step: Redistribute intermediate table
>> #
>> DISTRIBUTE BY is that:
>> INSERT OVERWRITE TABLE table_intermediate SELECT * FROM table_intermediate DISTRIBUTE
BY Field1, Field2, Field3;
>> #
>> Not DISTRIBUTE BY RAND()
>> #
>> Is this default DISTRIBUTE BY Field1, Field2, Field3? how to DISTRIBUTE BY RAND()?
>> 
>> Best wishes.
Mime
  • Unnamed multipart/alternative (inline, 8-Bit, 0 bytes)
View raw message