drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rahul Challapalli (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2264) Incorrect data when we use aggregate functions with flatten
Date Wed, 18 Feb 2015 18:52:12 GMT
Rahul Challapalli created DRILL-2264:
----------------------------------------

             Summary: Incorrect data when we use aggregate functions with flatten
                 Key: DRILL-2264
                 URL: https://issues.apache.org/jira/browse/DRILL-2264
             Project: Apache Drill
          Issue Type: Bug
          Components: Functions - Drill
            Reporter: Rahul Challapalli
            Assignee: Jason Altekruse
            Priority: Critical


git.commit.id.abbrev=6676f2d

Data Set :
{code}
{
  "uid":1,
  "lst_lst" : [[1,2],[3,4]]
}
{
  "uid":2,
  "lst_lst" : [[1,2],[3,4]]
}
{code}

The below query returns incorrect results :
{code}
select uid,MAX( flatten(lst_lst[1]) + flatten(lst_lst[0])) from `temp.json` group by uid,
flatten(lst_lst[1]), flatten(lst_lst[0]);
+------------+------------+
|    uid     |   EXPR$1   |
+------------+------------+
| 1          | 6          |
| 1          | 6          |
| 1          | 6          |
| 1          | 6          |
| 2          | 6          |
| 2          | 6          |
| 2          | 6          |
| 2          | 6          |
+------------+------------+
{code}

However if we use a sub query, drill returns the right data
{code}
select uid, MAX(l1+l2) from (select uid,flatten(lst_lst[1]) l1, flatten(lst_lst[0]) l2 from
`temp.json`) sub group by uid, l1, l2;
+------------+------------+
|    uid     |   EXPR$1   |
+------------+------------+
| 1          | 4          |
| 1          | 5          |
| 1          | 5          |
| 1          | 6          |
| 2          | 4          |
| 2          | 5          |
| 2          | 5          |
| 2          | 6          |
+------------+------------+
{code}


Also using a single flatten yields proper results
{code}
select uid,MAX(flatten(lst_lst[0])) from `temp.json` group by uid, flatten(lst_lst[0]);
+------------+------------+
|    uid     |   EXPR$1   |
+------------+------------+
| 1          | 1          |
| 1          | 2          |
| 2          | 1          |
| 2          | 2          |
+------------+------------+
{code}

Marked it as critical since we return in-correct data. Let me know if you have any other questions



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message