drill-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paul Rogers (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-5142) TestWindowFrame.testUnboundedFollowing relies on side effects
Date Wed, 21 Dec 2016 00:40:58 GMT
Paul Rogers created DRILL-5142:
----------------------------------

             Summary: TestWindowFrame.testUnboundedFollowing relies on side effects
                 Key: DRILL-5142
                 URL: https://issues.apache.org/jira/browse/DRILL-5142
             Project: Apache Drill
          Issue Type: Bug
            Reporter: Paul Rogers
            Assignee: Paul Rogers
            Priority: Minor


The unit test {{TestWindowFrame.testUnboundedFollowing}} is one of a family of tests that
does the same query in two different ways, using the results of the second to verify the first.
Unfortunately, this particular tests "works" only because it relies on undefined implementation
artifacts about the way the "verification" query is run in Drill.

Here is the query under test:
{code}
SELECT 
  position_id,
  employee_id,
  LAST_VALUE(employee_id)
    OVER(PARTITION BY position_id
         ORDER by employee_id
         RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS `last_value`
FROM
  dfs_test.`%s/window/b4.p4`
{code}
With expected results as follows:
{code}
1,0,9
...
1,9,9
{code}
Here is the "expected results" query:
{code}
SELECT
  position_id,
  employee_id,
  MAX(employee_id) OVER(PARTITION BY position_id) AS `last_value`
FROM (
  SELECT *
  FROM dfs_test.`%s/window/b4.p4`
  ORDER BY position_id, employee_id
)
{code}
The above happens to produce the correct results only because the query executes in a single
fragment. The query produces correct results with the "unmanaged" external sort, but produces
the following (valid) results with the managed external sort:
{code}
1,0,9
1,2,9
...
1,9,9
1,1,9
{code}
The query relies on the inner query sort order "showing through" to the outer query. But,
if the query were distributed, the outer query would be unordered. Hence, the verification
query just happened to work, but is not actually valid.

The proper solution is to modify the verification query to move the ORDER BY to the outer
query:
{code}
...
FROM (
  SELECT *
  FROM dfs_test.`%s/window/b4.p4`
)
  ORDER BY position_id, employee_id
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message