drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-3689) incorrect results : aggregate AVG returns wrong results over results returned by LEAD function.
Date Fri, 11 Sep 2015 02:21:46 GMT

    [ https://issues.apache.org/jira/browse/DRILL-3689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14740032#comment-14740032
] 

Khurram Faraaz commented on DRILL-3689:
---------------------------------------

I tried with data that was not close to boundary values of INT64 type, I did not see the difference
in results with such data.

However, it will be good to inform user of the overflow error as soon as it is hit, rather
than return wrong results, that way the user knows that there was an overflow error. Just
returning results (in this case incorrect results) is not a good practice. Other systems like
DB2 report an overflow error as soon as it is hit.

> incorrect results : aggregate AVG returns wrong results over results returned by LEAD
function.
> -----------------------------------------------------------------------------------------------
>
>                 Key: DRILL-3689
>                 URL: https://issues.apache.org/jira/browse/DRILL-3689
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Relational Operators
>    Affects Versions: 1.2.0
>         Environment:  private-branch https://github.com/adeneche/incubator-drill/tree/new-window-funcs
>            Reporter: Khurram Faraaz
>            Assignee: Khurram Faraaz
>            Priority: Critical
>              Labels: window_function
>             Fix For: 1.2.0
>
>
> Aggregate AVG returns wrong results over results returned by LEAD function.
> results returned by Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) OVER(PARTITION
BY col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query;
> +-------------------------+
> |         EXPR$0          |
> +-------------------------+
> | 2.35195986941647008E17  |
> +-------------------------+
> 1 row selected (0.264 seconds)
> {code}
> Explain plan for above query from Drill
> {code}
> 0: jdbc:drill:schema=dfs.tmp> explain plan for SELECT  avg(lead_col1) FROM (SELECT
LEAD(col1) OVER(PARTITION BY col7 ORDER BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query;
> +------+------+
> | text | json |
> +------+------+
> | 00-00    Screen
> 00-01      Project(EXPR$0=[$0])
> 00-02        Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY NOT
NULL])
> 00-03          StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT($0)])
> 00-04            Project(w0$o0=[$3])
> 00-05              Window(window#0=[window(partition {2} order by [1] range between UNBOUNDED
PRECEDING and CURRENT ROW aggs [LEAD($1)])])
> 00-06                SelectionVectorRemover
> 00-07                  Sort(sort0=[$2], sort1=[$1], dir0=[ASC], dir1=[ASC])
> 00-08                    Project(T36¦¦*=[$0], col1=[$1], col7=[$2])
> 00-09                      Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:///tmp/FEWRWSPQQ_101]], selectionRoot=maprfs:/tmp/FEWRWSPQQ_101, numFiles=1,
columns=[`*`]]])
> {code}
> results returned by Postgres
> {code}
> postgres=# SELECT  avg(lead_col1) FROM (SELECT LEAD(col1) OVER(PARTITION BY col7 ORDER
BY col1) lead_col1 , col7 FROM FEWRWSPQQ_101) sub_query;
>          avg         
> ---------------------
>  1157533190627124568
> (1 row)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message