drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jinfeng Ni (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-1957) NULL values in NOT IN sub-queries could lead to incorrect query result.
Date Thu, 08 Jan 2015 02:09:34 GMT
Jinfeng Ni created DRILL-1957:
---------------------------------

             Summary: NULL values in NOT IN sub-queries could lead to incorrect query result.

                 Key: DRILL-1957
                 URL: https://issues.apache.org/jira/browse/DRILL-1957
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
            Reporter: Jinfeng Ni
            Assignee: Jinfeng Ni


NULL values in NOT IN sub-queries will disqualify all the rows, and hence return 0 row for
a query. However, Drill could return some rows, due to issue in the query planning.

For example, consider the following sample data :
{code}
select deptno from dfs.`/Users/jni/work/data/json/e.json` ;
+------------+
|   deptno   |
+------------+
| 10         |
| 20         |
| null       |
+------------+
3 rows selected (0.062 seconds)
0: jdbc:drill:zk=local> select deptno from dfs.`/Users/jni/work/data/json/d.json` ;
+------------+
|   deptno   |
+------------+
| 10         |
| null       |
+------------+
2 rows selected (0.062 seconds)
{code}

The following query should return 0 row, but Drill actually return 1 row.

{code}
select deptno from dfs.`/Users/jni/work/data/json/e.json` where deptno not in (select deptno
from dfs.`/Users/jni/work/data/json/d.json`);
+------------+
|   deptno   |
+------------+
| 20         |
+------------+
1 row selected (0.286 seconds)
{code}

In comparison, here is the result run on posture.
{code}
mydb=# select * from e;
 deptno | ename
--------+-------
     10 | Alice
   NULL | Tom
     20 | Bob
(3 rows)

mydb=# delete from d where deptno = 20;
DELETE 1
mydb=# select * from d;
 deptno | dname
--------+-------
     10 | Sales
   NULL | Eng
(2 rows)

mydb=# select deptno from e where deptno not in (select deptno from d);
 deptno
--------
(0 rows)
{code}

The fix seems to be in OPTIQ-373, which will produce a valid plan. However, such plan contain
a cross-join, which is not supported Drill, and hence hit CanNotPlanException in Drill.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message