drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Khurram Faraaz (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2953) Group By + Order By query results are not ordered.
Date Mon, 04 May 2015 21:43:06 GMT
Khurram Faraaz created DRILL-2953:
-------------------------------------

             Summary: Group By + Order By query results are not ordered.
                 Key: DRILL-2953
                 URL: https://issues.apache.org/jira/browse/DRILL-2953
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 0.9.0
         Environment: 10833d2cae9f5312cf0e31f8c9f3f8a9dcdc0c45 | Commit 0.9.0 release version.
| 03.05.2015 @ 14:56:56 EDT
            Reporter: Khurram Faraaz
            Assignee: Jinfeng Ni
            Priority: Critical


Group by + order by query does not return results in correct order. Sort is performed before
the aggregation is done, which should not be the case.
Test was performed on 4 node cluster on CentOS.

{code}
0: jdbc:drill:> select cast(columns[0] as int) c1 from `testWindow.csv` t2 where t2.columns[0]
is not null group by columns[0] order by columns[0];
+------------+
|     c1     |
+------------+
| 10         |
| 100        |
| 113        |
| 119        |
| 2          |
| 50         |
| 55         |
| 57         |
| 61         |
| 67         |
| 89         |
+------------+
11 rows selected (0.218 seconds)
{code}

Explain plan for that query that returns wrong results.
{code}
0: jdbc:drill:> explain plan for select cast(columns[0] as int) c1 from `testWindow.csv`
t2 where t2.columns[0] is not null group by columns[0] order by columns[0];
+------------+------------+
|    text    |    json    |
+------------+------------+
| 00-00    Screen
00-01      Project(c1=[$0])
00-02        Project(c1=[CAST($0):INTEGER], EXPR$1=[$0])
00-03          StreamAgg(group=[{0}])
00-04            Sort(sort0=[$0], dir0=[ASC])
00-05              Filter(condition=[IS NOT NULL($0)])
00-06                Project(ITEM=[ITEM($0, 0)])
00-07                  Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/testWindow.csv, numFiles=1,
columns=[`columns`[0]], files=[maprfs:/tmp/testWindow.csv]]])
{code} 

Incorrect results , not in order.
{code}
0: jdbc:drill:> select cast(columns[0] as int) from `testWindow.csv` t2 where t2.columns[0]
is not null group by columns[0] order by columns[0];
+------------+
|   EXPR$0   |
+------------+
| 10         |
| 100        |
| 113        |
| 119        |
| 2          |
| 50         |
| 55         |
| 57         |
| 61         |
| 67         |
| 89         |
+------------+
11 rows selected (0.214 seconds)
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message