Return-Path: X-Original-To: apmail-drill-issues-archive@minotaur.apache.org Delivered-To: apmail-drill-issues-archive@minotaur.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 0B31818B3A for ; Mon, 26 Oct 2015 18:01:06 +0000 (UTC) Received: (qmail 37761 invoked by uid 500); 26 Oct 2015 18:00:31 -0000 Delivered-To: apmail-drill-issues-archive@drill.apache.org Received: (qmail 37644 invoked by uid 500); 26 Oct 2015 18:00:31 -0000 Mailing-List: contact issues-help@drill.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@drill.apache.org Delivered-To: mailing list issues@drill.apache.org Received: (qmail 37354 invoked by uid 99); 26 Oct 2015 18:00:31 -0000 Received: from arcas.apache.org (HELO arcas) (140.211.11.28) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 26 Oct 2015 18:00:31 +0000 Received: from arcas.apache.org (localhost [127.0.0.1]) by arcas (Postfix) with ESMTP id E50892C1F60 for ; Mon, 26 Oct 2015 18:00:30 +0000 (UTC) Date: Mon, 26 Oct 2015 18:00:30 +0000 (UTC) From: "Zelaine Fong (JIRA)" To: issues@drill.apache.org Message-ID: In-Reply-To: References: Subject: [jira] [Updated] (DRILL-2543) Correlated subquery where outer table contains NULL values returns seemingly wrong result MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/DRILL-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zelaine Fong updated DRILL-2543: -------------------------------- Fix Version/s: (was: Future) 1.3.0 > Correlated subquery where outer table contains NULL values returns seemingly wrong result > ------------------------------------------------------------------------------------------ > > Key: DRILL-2543 > URL: https://issues.apache.org/jira/browse/DRILL-2543 > Project: Apache Drill > Issue Type: Bug > Components: Query Planning & Optimization > Affects Versions: 0.8.0 > Reporter: Victoria Markman > Assignee: Jinfeng Ni > Priority: Critical > Fix For: 1.3.0 > > > {code} > 0: jdbc:drill:schema=dfs> select * from t1; > +------------+------------+------------+ > | a1 | b1 | c1 | > +------------+------------+------------+ > | 1 | 2015-03-01 | aaaaa | > | 2 | 2015-03-02 | bbbbb | > | null | null | null | > +------------+------------+------------+ > 3 rows selected (0.064 seconds) > 0: jdbc:drill:schema=dfs> select * from t2; > +------------+------------+------------+ > | a2 | b2 | c2 | > +------------+------------+------------+ > | 5 | 2017-03-01 | a | > +------------+------------+------------+ > 1 row selected (0.07 seconds) > 0: jdbc:drill:schema=dfs> select t1.c1, count(*) from t1 where t1.b1 not in (select b2 from t2 where t1.a1 = t2.a2) group by t1.c1 order by t1.c1; > +------------+------------+ > | c1 | EXPR$1 | > +------------+------------+ > | aaaaa | 1 | > | bbbbb | 1 | > +------------+------------+ > 2 rows selected (0.32 seconds) > {code} > Postgres returns row from the outer table where a1 is null. > This is part that I don't understand, because join condition in the subquery should have eliminated row where a1 IS NULL. To me Drill result looks correct. Unless there is something different in correlated comparison semantics that I'm not aware of. > {code} > postgres=# select * from t1; > a1 | b1 | c1 > ----+------------+------- > 1 | 2015-03-01 | aaaaa > 2 | 2015-03-02 | bbbbb > | | > (3 rows) > {code} > Explain plan for the query: > {code} > 00-01 Project(c1=[$0], EXPR$1=[$1]) > 00-02 StreamAgg(group=[{0}], EXPR$1=[COUNT()]) > 00-03 Sort(sort0=[$0], dir0=[ASC]) > 00-04 Project(c1=[$0]) > 00-05 SelectionVectorRemover > 00-06 Filter(condition=[NOT(IS TRUE($3))]) > 00-07 HashJoin(condition=[=($1, $2)], joinType=[left]) > 00-09 Project($f1=[$0], $f3=[$2]) > 00-11 SelectionVectorRemover > 00-13 Filter(condition=[IS NOT NULL($1)]) > 00-15 Project(c1=[$1], b1=[$0], a1=[$2]) > 00-17 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1, numFiles=1, columns=[`c1`, `b1`, `a1`]]]) > 00-08 Project($f02=[$1], $f2=[$2]) > 00-10 StreamAgg(group=[{0, 1}], agg#0=[MIN($2)]) > 00-12 Sort(sort0=[$0], sort1=[$1], dir0=[ASC], dir1=[ASC]) > 00-14 Project($f0=[$1], $f02=[$2], $f1=[true]) > 00-16 HashJoin(condition=[=($2, $0)], joinType=[inner]) > 00-18 StreamAgg(group=[{0}]) > 00-20 Sort(sort0=[$0], dir0=[ASC]) > 00-22 Project($f0=[$1]) > 00-23 SelectionVectorRemover > 00-24 Filter(condition=[IS NOT NULL($0)]) > 00-25 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/test/t1]], selectionRoot=/test/t1, numFiles=1, columns=[`b1`, `a1`]]]) > 00-19 Project(a2=[$1], b2=[$0]) > 00-21 Scan(groupscan=[ParquetGroupScan [entries=[ReadEntryWithPath [path=maprfs:/test/t2]], selectionRoot=/test/t2, numFiles=1, columns=[`a2`, `b2`]]]) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332)