drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DRILL-2953) Group By + Order By query results are not ordered.
Date Tue, 12 May 2015 06:01:00 GMT

    [ https://issues.apache.org/jira/browse/DRILL-2953?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14539307#comment-14539307
] 

Jinfeng Ni commented on DRILL-2953:
-----------------------------------

I could not locate the exact cause of the problem. Seems to me there was some issue between
DrillPushProjectPastJoin/FilterRule and ProjectMergeRule, when the project has ITEM operator
in its expression.

An identical query, yet not having ITEM operator would run successful.

{code}
    test( " explain plan for " +
        "select bi.n_nationkey, ci.n_nationkey, di.n_nationkey, fi.n_nationkey, ini.n_nationkey,
vi.n_nationkey" +
        " from cp.`tpch/nation.parquet` bi, " +
        "      cp.`tpch/nation.parquet` ci , " +
        "      cp.`tpch/nation.parquet` di, " +
        "      cp.`tpch/nation.parquet` fi," +
        "      cp.`tpch/nation.parquet` ini," +
        "      cp.`tpch/nation.parquet` vi" +
        " where bi.n_nationkey = ci.n_nationkey " +
        " and di.n_nationkey = fi.n_nationkey" +
        " and ini.n_nationkey = vi.n_nationkey " +
        " and ci.n_nationkey = di.n_nationkey" +
        " and fi.n_nationkey = ini.n_nationkey" );
{code}

Also, if I remove one join condition in the original query, to get rid of a circle in terms
of join condition, then the query also runs successfully.

{code}
    test( " explain plan for " +
        "select bi.columns[0], ci.columns[0], di.columns[0], fi.columns[0], ini.columns[0],
vi.columns[0]" +
        " from dfs.`/tmp/1.csv` bi, " +
        "      dfs.`/tmp/2.csv` ci , " +
        "      dfs.`/tmp/3.csv` di, " +
        "      dfs.`/tmp/4.csv` fi," +
        "      dfs.`/tmp/5.csv` ini," +
        "      dfs.`/tmp/6.csv` vi" +
        " where bi.columns[0] = ci.columns[0]" +
        " and di.columns[0] = fi.columns[0]" +
        " and ini.columns[0] = vi.columns[0] " +
        " and ci.columns[0] = di.columns[0]" +
        " and fi.columns[0] = ini.columns[0]" );
{code}
 
Given the above analysis, I'd like to defer this JIRA to targeting 1.1.0.


> Group By + Order By query results are not ordered.
> --------------------------------------------------
>
>                 Key: DRILL-2953
>                 URL: https://issues.apache.org/jira/browse/DRILL-2953
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Query Planning & Optimization
>    Affects Versions: 0.9.0
>         Environment: 10833d2cae9f5312cf0e31f8c9f3f8a9dcdc0c45 | Commit 0.9.0 release
version. | 03.05.2015 @ 14:56:56 EDT
>            Reporter: Khurram Faraaz
>            Assignee: Jinfeng Ni
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: 0001-DRILL-2953-Ensure-sort-would-be-enforced-when-a-cast.patch
>
>
> Group by + order by query does not return results in correct order. Sort is performed
before the aggregation is done, which should not be the case.
> Test was performed on 4 node cluster on CentOS.
> {code}
> 0: jdbc:drill:> select cast(columns[0] as int) c1 from `testWindow.csv` t2 where t2.columns[0]
is not null group by columns[0] order by columns[0];
> +------------+
> |     c1     |
> +------------+
> | 10         |
> | 100        |
> | 113        |
> | 119        |
> | 2          |
> | 50         |
> | 55         |
> | 57         |
> | 61         |
> | 67         |
> | 89         |
> +------------+
> 11 rows selected (0.218 seconds)
> {code}
> Explain plan for that query that returns wrong results.
> {code}
> 0: jdbc:drill:> explain plan for select cast(columns[0] as int) c1 from `testWindow.csv`
t2 where t2.columns[0] is not null group by columns[0] order by columns[0];
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      Project(c1=[$0])
> 00-02        Project(c1=[CAST($0):INTEGER], EXPR$1=[$0])
> 00-03          StreamAgg(group=[{0}])
> 00-04            Sort(sort0=[$0], dir0=[ASC])
> 00-05              Filter(condition=[IS NOT NULL($0)])
> 00-06                Project(ITEM=[ITEM($0, 0)])
> 00-07                  Scan(groupscan=[EasyGroupScan [selectionRoot=/tmp/testWindow.csv,
numFiles=1, columns=[`columns`[0]], files=[maprfs:/tmp/testWindow.csv]]])
> {code} 
> Incorrect results , not in order.
> {code}
> 0: jdbc:drill:> select cast(columns[0] as int) from `testWindow.csv` t2 where t2.columns[0]
is not null group by columns[0] order by columns[0];
> +------------+
> |   EXPR$0   |
> +------------+
> | 10         |
> | 100        |
> | 113        |
> | 119        |
> | 2          |
> | 50         |
> | 55         |
> | 57         |
> | 61         |
> | 67         |
> | 89         |
> +------------+
> 11 rows selected (0.214 seconds)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message