kylin-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "lifan.su" <suli...@lvwan.com>
Subject Re: Kylin fails during calcite execution when performing window query
Date Mon, 29 Apr 2019 13:42:13 GMT
In addition, the behavior occurred on both Kylin-3.0.0-alpha and master version.


------------------------------------------------------------------
From:lifan.su<sulifan@lvwan.com>
Send Time:2019年4月29日(Monday) 21:34
To:user <user@kylin.apache.org>
Subject:Kylin fails during calcite execution when performing window query

Hello,

I am trying to use Kylin to perform distinct count over a running window. I
tried two different methods, but Kylin failed to execute both. I can
reproduce the problem using following queries.

======== Query 1 ========
SELECT
    t.part_dt,
    ( SELECT
        count( distinct buyer_id )
    FROM
        KYLIN_SALES
    WHERE part_dt BETWEEN TIMESTAMPADD ( day, -3, t.part_dt ) AND t.part_dt
    ) as total
FROM ( values
    ( date '2012-01-01' ),
    ( date '2012-01-02' ),
    ( date '2012-01-03' ),
    ( date '2012-01-04' ),
    ( date '2012-01-05' ),
    ( date '2012-01-06' )
) as t( part_dt )

======== Query 2 ========
SELECT
  t.part_dt , 
  count(distinct kylin_sales.buyer_id) over (
    partition by t.part_dt 
    order by t.part_dt 
    range interval '3' day preceding )
  as buyers
FROM 
  ( values
    ( date '2012-02-01' ),
    ( date '2012-02-02' ),
    ( date '2012-02-03' ),
    ( date '2012-02-04' ),
    ( date '2012-02-05' ),
    ( date '2012-02-06' )
  ) as t( part_dt ) 
  LEFT OUTER JOIN KYLIN_SALES
  ON t.part_dt = KYLIN_SALES.PART_DT

I have added following model and cubes as follows:
Model: 
  Dimensions: KYLIN_SALES: {"PART_DT", "SELLER_ID"}
  Measures: KYLIN_SALES.BUYER_ID
Cube:
  Dimensions: PART_DT, SELLER_ID
  Measures: COUNT_DISTINCT(exact): Value:KYLIN_SALES.BUYER_ID
  Rowkeys: SELLER_ID (dict), PART_DT (date)

This behavior occurred on both CDH-5.8.2 (using package for CDH 5.7)
and FusionInsight V100R002C80SPC200 (using binary package for HBase 1.x).
Currently I don't have a sandbox do reproduce this behavior.

The equivalent queries of the first query on Postgres and MySQL can be found
at following links. Because of the different dialects supported by the RDBMS
engines, the queries are modified correspondingly from the Kylin dialect. 

http://sqlfiddle.com/#!17/1ded8/2 and http://sqlfiddle.com/#!9/4f4afe/8

Second query does not have equivalent ones in Postgres or MySQL since both
do not support distinct in window function.

Corresponding logs during execution of the queries are attected.

Is this behavior expected? If so, is there a workaround for this case? 

Best regards.
Lifan Su

Mime
View raw message