kylin-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "shengxiaopeng (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (KYLIN-3703) get negative value when query kylin
Date Mon, 26 Nov 2018 09:17:00 GMT

    [ https://issues.apache.org/jira/browse/KYLIN-3703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16698654#comment-16698654
] 

shengxiaopeng commented on KYLIN-3703:
--------------------------------------

The source data is preprocessed  before entering kafka. The value ranges from 0 to 50000
. Now that the metric value sum(prf_pl_fs) is out of bounds,which is bigint type when define
mesures, do we need to limit prf_pl_fs to long type in source table?

> get negative value when query  kylin
> ------------------------------------
>
>                 Key: KYLIN-3703
>                 URL: https://issues.apache.org/jira/browse/KYLIN-3703
>             Project: Kylin
>          Issue Type: Bug
>    Affects Versions: v2.5.1
>            Reporter: shengxiaopeng
>            Priority: Major
>
> version: Kylin 2.5.1 
> There is a cube built in real time, one metric is average time consuming, *the source
data does not have a negative value, but the negative value occurs by hourly aggregation*,
and the aggregation is normal by minute
>  
> *1. Metric definition*
> |Name|Expression|Parameters|Return Type|
> |SUM_FS|SUM|Value:APM_FIRSTSCREEN_WEB.PRF_PL_FS, Type:column|bigint|
>  
>  *2.Aggregate sql and results by hour,then negative measure value occurs*
>  ```
>  select 
>  hour_start
>  ,count(*) as amount
>  ,sum(prf_pl_fs) as sum_pl_fs
>  from DY_APM.APM_FIRSTSCREEN_WEB
>  where 1=1 and minute_start >= '2018-11-23 20:00:00' and minute_start < '2018-11-23
21:00:00' 
>  group by hour_start 
>  order by hour_start asc 
>  limit 15000
>  ```
>   
> |hour_start|amount|sum_pl_fs|
> |2018-11-23 20:00:00|974466|-1317398890|
> *3.Aggregate sql and results by minute*
>  ```
>  select 
>  minute_start
>  ,count(*) as amount
>  ,sum(prf_pl_fs) as sum_pl_fs
>  from DY_APM.APM_FIRSTSCREEN_WEB
>  where 1=1 and minute_start >= '2018-11-23 20:00:00' and minute_start < '2018-11-23
21:00:00' 
>  group by minute_start 
>  order by minute_start asc 
>  limit 15000
>  ```
> |hour_start|amount|sum_pl_fs|
> |2018-11-23 20:21:00|2629|8107124|
> |2018-11-23 20:22:00|12216|36558869|
> |2018-11-23 20:23:00|12800|38401450|
> |...|略|...|
>  
> 4. I guess this should be caused by crossing the border. 
> I use the data of each minute to get the hourly data, and the total time is *2977568406*,
which is far from the scope of bigint.
>  ```
>  select 
>  sum(t.amount) amount
>  ,sum(t.sum_pl_fs) sum_pl_fs
>  ,sum(t.sum_pl_fs) / sum(t.amount) as avg_pl_fs
>  from 
>  (
>  select 
>  minute_start
>  ,count(*) as amount
>  ,sum(prf_pl_fs) as sum_pl_fs
>  from DY_APM.APM_FIRSTSCREEN_WEB
>  where 1=1 and minute_start >= '2018-11-23 20:00:00' and minute_start < '2018-11-23
21:00:00' 
>  group by minute_start 
>  order by minute_start asc 
>  limit 15000
>  ) as t
>  ```
> |amount|sum_pl_fs|avg_pl_fs|
> |974466|2977568406|3055|
>  
> After verification, when cast 2977568406 to int ,then value is -1317398890
>  
>  
> *Excuse me, is this question a problem with my configuration or query? How should it
be solved?*
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Mime
View raw message