drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mehant Baid (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-1897) Using 'avg' along with 'length' results in wrong output when nulls are present
Date Thu, 05 Mar 2015 03:11:38 GMT

     [ https://issues.apache.org/jira/browse/DRILL-1897?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mehant Baid updated DRILL-1897:
-------------------------------
    Attachment: DRILL-1897.patch

[~amansinha100] can you please review.

> Using 'avg' along with 'length' results in wrong output when nulls are present
> ------------------------------------------------------------------------------
>
>                 Key: DRILL-1897
>                 URL: https://issues.apache.org/jira/browse/DRILL-1897
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Functions - Drill
>            Reporter: Rahul Challapalli
>            Assignee: Mehant Baid
>             Fix For: 0.9.0
>
>         Attachments: DRILL-1897.patch
>
>
> git.commit.id.abbrev=9dfa4a1
> Dataset :
> {code}
> {
>  "col1":1,
>  "col2":"abc"
> }
> {
>  "col1":1,
>  "col2":null
> }
> {
>  "col1":null,
>  "col2":null
> }
> {code}
> The below query should return 3 instead of 1 (postgres correctly returns 3).
> Query :
> {code}
> select avg(length(col2)) from `a.json`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1.0        |
> +------------+
> {code} 
> Plan for the above query :
> {code}
> 00-00    Screen
> 00-01      Project(EXPR$0=[CAST(/(CastHigh(CASE(=($1, 0), null, $0)), $1)):ANY])
> 00-02        StreamAgg(group=[{}], agg#0=[$SUM0($0)], agg#1=[COUNT()])
> 00-03          Project($f0=[length($0)])
> 00-04            Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/data-shapes/wide-columns/flat/json/a.json,
numFiles=1, columns=[`col2`], files=[maprfs:/drill/testdata/data-shapes/wide-columns/flat/json/a.json]]])
> {code}
> The below query which does not involve a second function works as expected even though
nulls are present
> {code}
>  select avg(col1) from `a.json`;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1.0        |
> +------------+
> {code}
> Let me know if you need any more information.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message