kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Li Yang <liy...@apache.org>
Subject Re: Query time hierarchy and time range in Kylin cube.
Date Sat, 26 May 2018 05:48:21 GMT
If it is possible to alter the table structure a little, the easiest way is
to encode date and hour in one column and use that column as kylin
partition column. E.g. have a new column called "date_and_hour". It takes
values like 2018051310 to represent 10 am of 2018-05-13. And your query may
go

select count(*) from fact where date_and_hour  between 2018051200 and
2018051323


Cheers
Yang

On Fri, May 18, 2018 at 10:55 AM, <vula2@viettel.com.vn> wrote:

> My current approach to time hierarchy and partition is as follow:
>
> -          I have partition date and hour column in Hive table to avoid
> full Hive table scan. Column names are partition_date, partition_hour
>
> -          I have separate fields in fact table name Year, Month, Day,
> Hour and use these column as hierarchy dimensions in Kylin cube build. I
> use *dictionary encoding*.
>
> -          When I want to query time range, I have to list all
> combination of time hierarchy dimensions, for example (Month, Day), in
> order to query.
>
> My query seems to be slower when the cube get bigger with the same time
> range. So I want to ask the best practice to design time hierarchy and
> query time range in Kylin. I see some support for timestamp in Streaming
> cube but I don’t see guideline for design time dimension for normal cube
> except partition date and hour in Hive.
>
> I also suspect that my time range query get slower because it currently
> need to scan all segment.
>
> I think we need
>
>
>

Mime
View raw message