drill-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Victoria Markman (JIRA)" <j...@apache.org>
Subject [jira] [Created] (DRILL-2543) Correlated subquery where outer table contains NULL values returns seemingly wrong result
Date Tue, 24 Mar 2015 22:13:54 GMT
Victoria Markman created DRILL-2543:
---------------------------------------

             Summary: Correlated subquery where outer table contains NULL values returns 
seemingly wrong result
                 Key: DRILL-2543
                 URL: https://issues.apache.org/jira/browse/DRILL-2543
             Project: Apache Drill
          Issue Type: Bug
          Components: Query Planning & Optimization
    Affects Versions: 0.8.0
            Reporter: Victoria Markman
            Assignee: Jinfeng Ni
            Priority: Critical


{code}
0: jdbc:drill:schema=dfs> select * from t1;
+------------+------------+------------+
|     a1     |     b1     |     c1     |
+------------+------------+------------+
| 1          | 2015-03-01 | aaaaa      |
| 2          | 2015-03-02 | bbbbb      |
| null       | null       | null       |
+------------+------------+------------+
3 rows selected (0.064 seconds)

0: jdbc:drill:schema=dfs> select * from t2;
+------------+------------+------------+
|     a2     |     b2     |     c2     |
+------------+------------+------------+
| 5          | 2017-03-01 | a          |
+------------+------------+------------+
1 row selected (0.07 seconds)

0: jdbc:drill:schema=dfs> select t1.c1, count(*) from t1 where t1.b1 not in (select b2
from t2 where t1.a1 = t2.a2) group by t1.c1 order by t1.c1;
+------------+------------+
|     c1     |   EXPR$1   |
+------------+------------+
| aaaaa      | 1          |
| bbbbb      | 1          |
+------------+------------+
2 rows selected (0.32 seconds)
{code}

Postgres returns row from the outer table where a1 is null.
This is part that I don't understand, because join condition in the subquery should have eliminated
row where a1 IS NULL. To me Drill result looks correct. Unless there is something different
in correlated comparison semantics that I'm not aware of.

{code}
postgres=# select * from t1;
 a1 |     b1     |  c1
----+------------+-------
  1 | 2015-03-01 | aaaaa
  2 | 2015-03-02 | bbbbb
    |            |
(3 rows)
{code}

Explain plan for the query:
{code}
00-01      Project(c1=[$0], EXPR$1=[$1])
00-02        StreamAgg(group=[{0}], EXPR$1=[COUNT()])
00-03          Sort(sort0=[$0], dir0=[ASC])
00-04            Project(c1=[$0])
00-05              SelectionVectorRemover
00-06                Filter(condition=[NOT(IS TRUE($3))])
00-07                  HashJoin(condition=[=($1, $2)], joinType=[left])
00-09                    Project($f1=[$0], $f3=[$2])
00-11                      SelectionVectorRemover
00-13                        Filter(condition=[IS NOT NULL($1)])
00-15                          Project(c1=[$1], b1=[$0], a1=[$2])
00-17                            Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/test/t1]], selectionRoot=/test/t1, numFiles=1, columns=[`c1`, `b1`, `a1`]]])
00-08                    Project($f02=[$1], $f2=[$2])
00-10                      StreamAgg(group=[{0, 1}], agg#0=[MIN($2)])
00-12                        Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC])
00-14                          Project($f0=[$1], $f02=[$2], $f1=[true])
00-16                            HashJoin(condition=[=($2, $0)], joinType=[inner])
00-18                              StreamAgg(group=[{0}])
00-20                                Sort(sort0=[$0], dir0=[ASC])
00-22                                  Project($f0=[$1])
00-23                                    SelectionVectorRemover
00-24                                      Filter(condition=[IS NOT NULL($0)])
00-25                                        Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/test/t1]], selectionRoot=/test/t1, numFiles=1, columns=[`b1`, `a1`]]])
00-19                              Project(a2=[$1], b2=[$0])
00-21                                Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath
[path=maprfs:/test/t2]], selectionRoot=/test/t2, numFiles=1, columns=[`a2`, `b2`]]])
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Mime
View raw message