db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mike Matrigali (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-3926) Incorrect ORDER BY caused by index
Date Mon, 04 May 2009 20:02:30 GMT

     [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Mike Matrigali updated DERBY-3926:
----------------------------------


Let me know if I am understanding what is going on.
To me what looks like is happening is that the optimizer is looking
at each of the join nodes and says the following:
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3 :
    always ordered because we only get rows where table3.value = 21857
    but note that it is a non-unique key so multiple rows can come back.

table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
    here it see's that an index scan on table2 will return keys in table2.valu
    order.

Now it seems like the optimizer is saying that since table3 is always ordered
that it can then consider the result set to be solely ordered on
table2.value.  This would be a correct assumption if table3 returned a single
row, but not correct when it returns multiple rows.

In the bug case it looks to me like table3 returns 3 rows where
table3.value = 21857.

table3.id  | table3.value
----------------------------------------------
2147483653 |21857
2147483654 |21857
4294967297 |21857

Now the next part of the join has been forced to use
table2.nonUniqueOnValue_Table2, which is not useful other than providing
sorted access, so it turns into a table scan. The key is that it turns
into 3 table scans, searching for table3.id=table2.id leading to:

ID         |table2.VALUE |VALUE
----------------------------------------------
2147483653 |000002       |21857
2147483654 |000003       |21857
4294967297 |000001       |21857

The code does THREE full scan using the index and thus the rows are not
ordered by table2.value.  So even though we are using a key that should give
us the correct sorted order on table2.value, we are traversing it more than
once so it does not matter.  If the probes had been done using a single
scan with something like where (id = 2147483653 or 2147483654 or 4294967297)
it would have been sorted correctly, but that is not how joins work.

I believe the execution code is doing the right thing.   It seems to me that
the optimizer code is incorrectly using the "always sorted information"
incorrectly.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql,
test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024).
I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the
result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle
column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024),
PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented
me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring
the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message