kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <>
Subject Re: why distribute by partition column while creating flat hive table?
Date Tue, 23 Aug 2016 13:55:32 GMT
In 1.5.3 Kylin will redistribute the source record by the "shard by" column
(if user select such a column); the "shard by" is defined in the cube's
"Advanced setting" page. Tthe "shard by" column should be a High
Cardinality column; In your case, I guess you set the partition column's
"shard by" = true by mistake; please set it to false, and then resubmit a
build request;

2016-08-23 18:34 GMT+08:00 赵天烁 <>:

> I have a table with huge data increasment every day,bilion level.when I
> build a cube relate to that table,it stuck in creating flat hive
> table....for ever.
> I check the mr process and found that the task sql in this step is ended
> with "DISTRIBUTE BY  ${partition date column}"
> I try to manually execute the same sql,but remove the " distribute by ",
> then everything goes fine with in 10 min.
> as far as I know this step of create a flat table is helpful when I have a
> star schema,but what I only have is that fact table. so why bother to
> create a table with the same structure even the data are the same?the only
> different is the table name....
> so I think is it possible to just create a view with intermediate table
> name that kylin need when I havn't define any lookup table?this way will
> eliminate that long term task which seems like achieved nothing.
> ------------------------------
> 赵天烁
> Kevin Zhao
> * <>*
> 珠海市魅族科技有限公司
> MEIZU Technology Co., Ltd.
> 广东省珠海市科技创新海岸魅族科技楼
> MEIZU Tech Bldg., Technology & Innovation Coast
> Zhuhai, 519085, Guangdong, China

Best regards,

Shaofeng Shi

View raw message