drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5144) Query with two identical windows produces incorrect results
Date Wed, 21 Dec 2016 07:11:58 GMT
Paul Rogers created DRILL-5144:
----------------------------------

             Summary: Query with two identical windows produces incorrect results
                 Key: DRILL-5144
                 URL: https://issues.apache.org/jira/browse/DRILL-5144
             Project: Apache Drill
          Issue Type: Bug
    Affects Versions: 1.8.0
            Reporter: Paul Rogers


Consider the following query (from {{TestWindowFrame.test4657}}):

{code}
select row_number() over(order by position_id) rn, 
        rank() over(order by position_id) rnk
from dfs_test.`%s/window/b3.p2`
{code}

The planner produces the following plan:

{code}
{
  ...
  "graph" : [ {
    "pop" : "fs-scan",
    ...
    "columns" : [ "`position_id`" ],
    ...
  }, {
    "pop" : "external-sort",
    "@id" : 8,
    "child" : 9,
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    ...
  }, {
    "pop" : "selection-vector-remover",
    ...
  }, {
    "pop" : "window",
    "@id" : 6,
    "child" : 7,
    "aggregations" : [ {
      "ref" : "`w0$o0`",
      "expr" : "row_number(1) "
    } ],
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    "frameUnitsRows" : true,
    "start" : {
      "unbounded" : true,
      "offset" : -9223372036854775808
    },
    "end" : {
      "unbounded" : false,
      "offset" : 0
    },
    ...
  }, {
    "pop" : "external-sort",
    "@id" : 5,
    "child" : 6,
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    ...
  }, {
    "pop" : "selection-vector-remover",
    ...
  }, {
    "pop" : "window",
    ...
    "aggregations" : [ {
      "ref" : "`w1$o0`",
      "expr" : "rank(1) "
    } ],
    "orderings" : [ {
      "order" : "ASC",
      "expr" : "`position_id`",
      "nullDirection" : "UNSPECIFIED"
    } ],
    "frameUnitsRows" : false,
    "start" : {
      "unbounded" : true,
      "offset" : -9223372036854775808
    },
    "end" : {
      "unbounded" : false,
      "offset" : 0
    },
    ...
  }, {
    "pop" : "project",
    "@id" : 2,
    "exprs" : [ {
      "ref" : "`$0`",
      "expr" : "`w0$o0`"
    }, {
      "ref" : "`$1`",
      "expr" : "`w1$o0`"
    } ],
    ...
  }, {
    "pop" : "project",
    ...
    "exprs" : [ {
      "ref" : "`rn`",
      "expr" : "`$0`"
    }, {
      "ref" : "`rnk`",
      "expr" : "`$1`"
    } ],
    ...
  }, {
    "pop" : "screen",
    ...
  } ]
}
{code}

Note that the plan sorts the input data on {{position_id}} twice. One would work just as well.

Note also that, since Drill's sort is unstable, the order of like keys can change between
sorts; resulting in non-deterministic results (which show up in failures of the test mentioned
earlier).

Indeed, earlier versions of the tests "fudged" the external sort to make it stable so that
the tests would pass; but those tests hid the instability of this particular query. Without
the fudge, results are incorrect:

{code}
rn, rnk
1,1
3,1
4,1
5,1
2,1 <-- Wrong results
31,6 <-- Wrong results
6,6
...
30,6
32,6
...
60,6
{code}

The expected results is the equivalent of the following (invalid) query:

{code}
select row_number() rn, rank() rnk
         over(order by position_id)
from dfs_test.`%s/window/b3.p2`
{code}

{{TestWindowFrame.test4657}} will be disabled until a fix for this problem is available.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message