kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From ShaoFeng Shi <shaofeng...@apache.org>
Subject Re: Partition Date Issue
Date Wed, 16 May 2018 12:00:00 GMT
Hi Debdutto,

To match different partition policy, Kylin has a "IPartitionConditionBuilder"
interface. And there is exactly an implementation for tripple column
"YEAR", "MONTH", "DAY", please check:

https://github.com/apache/kylin/blob/master/core-metadata/src/main/java/org/apache/kylin/metadata/model/PartitionDesc.java#L301

The implementation will concat the three columns and then compare it with
the given dates, for example:

CONCAT(FACT.YEAR, FACT.MONTH, FACT.DAY) >= '2018-01-01' AND CONCAT(FACT.YEAR,
FACT.MONTH, FACT.DAY) < '2018-01-02'

While on Kylin UI there is no widge to enable this builder. You need to
manually modify the metadata of the Data Model with "bin/metastore.sh"
tool. And then change the "partition_condition_builder", for example:

"partition_desc" : {
  "partition_date_column" : "KYLIN_SALES.PART_DT",
  "partition_time_column" : null,
  "partition_date_start" : 1325376000000,
  "partition_date_format" : "yyyy-MM-dd",
  "partition_time_format" : "HH:mm:ss",
  "partition_type" : "APPEND",
  "partition_condition_builder" :
"org.apache.kylin.metadata.model.PartitionDesc$YearMonthDayPartitionConditionBuilder"
}






2018-05-15 21:42 GMT+08:00 Debdutto Chakraborty <debduttoc@gmail.com>:

> Hi,
>
> So, we have a hive table with analytical events data (impressions, clicks,
> conversions and such). A typical day produces around 50 to 100 million rows
> in this table with around 30 columns.
>
> We were trying to move to Kylin and prepare cubes from the data which is in
> this table.
>
> Now the problem is:
>
>    1. This hive table is partitioned on YEAR, MONTH, DAY columns. Which are
>    separate columns.
>    2. Kylin does not accept such separate columns as "Partition Date
>    Column".
>    3. Running Hive queries on non partitioned columns is a nightmare.
>
>
> The only solution to this that I see is that give the user an option during
> configuration to specify separate columns like this and then create the
> query accordingly.
>
> My only concern is that if this will impact the cube's "Refresh Settings"
>
> Please let me know if this should be done. I'm open to do the development
> and open a PR.
>
> Regards,
> Debdutto Chakraborty
>



-- 
Best regards,

Shaofeng Shi 史少锋

Mime
View raw message