spark-issues mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Reynold Xin (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (SPARK-15370) Some correlated subqueries return incorrect answers
Date Wed, 18 May 2016 01:36:12 GMT

     [ https://issues.apache.org/jira/browse/SPARK-15370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Reynold Xin updated SPARK-15370:
--------------------------------
    Target Version/s: 2.0.0

> Some correlated subqueries return incorrect answers
> ---------------------------------------------------
>
>                 Key: SPARK-15370
>                 URL: https://issues.apache.org/jira/browse/SPARK-15370
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 2.0.0
>            Reporter: Frederick Reiss
>
> The rewrite introduced in SPARK-14785 has the COUNT bug. The rewrite changes the semantics
of some correlated subqueries when there are tuples from the outer query block that do not
join with the subquery. For example:
> {noformat}
> spark-sql> create table R(a integer) as values (1);
> spark-sql> create table S(b integer);
> spark-sql> select R.a from R 
>          >     where (select count(*) from S where R.a = S.b) = 0;
> Time taken: 2.139 seconds                                                       
> spark-sql> 
> (returns zero rows; the answer should be one row of '1')
> {noformat}
> This problem also affects the SELECT clause:
> {noformat}
> spark-sql> select R.a, 
>          >     (select count(*) from S where R.a = S.b) as cnt 
>          > from R;
> 1	NULL
> (the answer should be "1 0")
> {noformat}
> Some subqueries with COUNT aggregates are *not* affected:
> {noformat}
> spark-sql> select R.a from R 
>          >     where (select count(*) from S where R.a = S.b) > 0;
> Time taken: 0.609 seconds
> spark-sql>
> (Correct answer)
> spark-sql> select R.a from R 
>          >     where (select count(*) + sum(S.b) from S where R.a = S.b) = 0;
> Time taken: 0.553 seconds
> spark-sql> 
> (Correct answer)
> {noformat}
> Other cases can trigger the variant of the COUNT bug for expressions involving NULL checks:
> {noformat}
> spark-sql> select R.a from R 
>          > where (select sum(S.b) is null from S where R.a = S.b);
> (returns zero rows, should return one row)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org


Mime
View raw message