kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From 赵天烁 <>
Subject Re: why distribute by partition column while creating flat hive table?
Date Tue, 23 Aug 2016 14:55:18 GMT
ok, I'll give it a shot,on the other hand,is it possible to eliminate the step to create that
flat table if the source table is the almost the same?

来自 魅族 PRO6

-------- 原始邮件 --------
发件人:ShaoFeng Shi <>
时间:周二 8月23日 21:56
收件人:user <>
主题:Re: why distribute by partition column while creating flat hive table?

In 1.5.3 Kylin will redistribute the source record by the "shard by" column (if user select
such a column); the "shard by" is defined in the cube's "Advanced setting" page. Tthe "shard
by" column should be a High Cardinality column; In your case, I guess you set the partition
column's "shard by" = true by mistake; please set it to false, and then resubmit a build request;

2016-08-23 18:34 GMT+08:00 赵天烁 <<>>:
I have a table with huge data increasment every day,bilion level.when I build a cube relate
to that table,it stuck in creating flat hive table....for ever.
I check the mr process and found that the task sql in this step is ended with "DISTRIBUTE
BY  ${partition date column}"
I try to manually execute the same sql,but remove the " distribute by ", then everything goes
fine with in 10 min.
as far as I know this step of create a flat table is helpful when I have a star schema,but
what I only have is that fact table. so why bother to create a table with the same structure
even the data are the same?the only different is the table name....
so I think is it possible to just create a view with intermediate table name that kylin need
when I havn't define any lookup table?this way will eliminate that long term task which seems
like achieved nothing.

Kevin Zhao<>

MEIZU Technology Co., Ltd.
MEIZU Tech Bldg., Technology & Innovation Coast
Zhuhai, 519085, Guangdong, China<>

Best regards,

Shaofeng Shi

View raw message