drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven Phillips (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (DRILL-2083) order by on large dataset returns wrong results
Date Wed, 22 Apr 2015 21:19:59 GMT

     [ https://issues.apache.org/jira/browse/DRILL-2083?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Steven Phillips updated DRILL-2083:
-----------------------------------
    Attachment: DRILL-2083.patch

> order by on large dataset returns wrong results
> -----------------------------------------------
>
>                 Key: DRILL-2083
>                 URL: https://issues.apache.org/jira/browse/DRILL-2083
>             Project: Apache Drill
>          Issue Type: Bug
>          Components: Execution - Data Types, Execution - Relational Operators
>    Affects Versions: 0.8.0
>            Reporter: Chun Chang
>            Assignee: Steven Phillips
>            Priority: Critical
>             Fix For: 1.0.0
>
>         Attachments: DRILL-2083.patch
>
>
> #Mon Jan 26 14:10:51 PST 2015
> git.commit.id.abbrev=3c6d0ef
> Test data has 1 million rows and can be accessed at 
> http://apache-drill.s3.amazonaws.com/files/complex.json.gz
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select count (t.id) from `complex.json`
t;
> +------------+
> |   EXPR$0   |
> +------------+
> | 1000000    |
> +------------+
> {code}
> But order by returned 30 more rows.
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> select t.id from `complex.json` t order
by t.id;
> ....
> | 999997     |
> | 999998     |
> | 999999     |
> | 1000000    |
> +------------+
> 1,000,030 rows selected (19.449 seconds)
> {code}
> physical plan
> {code}
> 0: jdbc:drill:schema=dfs.drillTestDirComplexJ> explain plan for select t.id from `complex.json`
t order by t.id;
> +------------+------------+
> |    text    |    json    |
> +------------+------------+
> | 00-00    Screen
> 00-01      SingleMergeExchange(sort0=[0 ASC])
> 01-01        SelectionVectorRemover
> 01-02          Sort(sort0=[$0], dir0=[ASC])
> 01-03            HashToRandomExchange(dist0=[[$0]])
> 02-01              Scan(groupscan=[EasyGroupScan [selectionRoot=/drill/testdata/complex_type/json/complex.json,
numFiles=1, columns=[`id`], files=[maprfs:/drill/testdata/complex_type/json/complex.json]]])
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message