db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (DERBY-6017) IN lists with constants may return wrong results
Date Tue, 18 Dec 2012 14:24:14 GMT

    [ https://issues.apache.org/jira/browse/DERBY-6017?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13534914#comment-13534914

Knut Anders Hatlen commented on DERBY-6017:

I believe this happens because of optimizations that are performed if the IN list consists
of constants only.

Such IN lists are sorted at compile time so that binary search can be used to find if there's
a match at run time. That's all good. However, the sorting and the binary search use different
ordering. The sorting (in ValueNodeList.sortInAscendingOrder()) uses the ordering of the type
with the highest precedence of the target and all the operands. The binary search (in DataType.in())
uses the ordering of the type with the highest precedence of each pair of values that it compares.

In the query above, this means:

The sorting happens using the type with the highest precedence of all the values. That is,
DOUBLE. All the four values in the IN list have the same DOUBLE value, so the list is already
sorted, regardless of how we order the actual values. But when binary search is performed
at run time, BIGINT semantics are used for some of the comparisons (those that involve BIGINTs
only) and DOUBLE comparison for others (those that involve a DOUBLE value). So the binary
search does not see the list as one that contain values that are all equal.

Additionally, during preprocessing, there is code to simplify the predicate if it's an IN
list where all values are equal. This check also uses the dominant type, DOUBLE, and finds
that the list indeed contains only one distinct value. It therefore eliminates the IN list
and replaces it with a simple equality check using just one of the values in the IN list.
That is, it rewrites the query from

    select * from t where x in (9223372036854775805, 9223372036854775806, 9223372036854775807,


    select * from t where x = 9223372036854775805

Those two queries are equivalent if the equality operator uses DOUBLE semantics. Unfortunately,
the information about what's the dominant type is lost when the IN list is eliminated, and
the equality check is performed using BIGINT semantics instead. The result is that only a
single row matches.

So I think there are two things that need to be fixed:

1) The sorting and the binary search must be made consistent.

2) The duplicate elimination must preserve type information.
> IN lists with constants may return wrong results
> ------------------------------------------------
>                 Key: DERBY-6017
>                 URL: https://issues.apache.org/jira/browse/DERBY-6017
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions:
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
> Given this table:
> ij> connect 'jdbc:derby:memory:db;create=true';
> ij> create table t(x bigint);
> 0 rows inserted/updated/deleted
> ij> insert into t values 9223372036854775805, 9223372036854775806, 9223372036854775807;
> 3 rows inserted/updated/deleted
> A query that uses an IN list that contains all the three values actually stored in the
table, returns all three rows as expected:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 9223372036854775807);
> X                   
> --------------------
> 9223372036854775805 
> 9223372036854775806 
> 9223372036854775807 
> 3 rows selected
> However, if we add a value whose type precedence is higher, like a DOUBLE value, and
that value happens to be equal to the approximation of the other values in the IN list when
they are cast from BIGINT to DOUBLE, only one row is returned:
> ij> select * from t where x in (9223372036854775805, 9223372036854775806, 9223372036854775807,
> X                   
> --------------------
> 9223372036854775805 
> 1 row selected
> I believe this query should return all three rows too.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

View raw message