db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index
Date Thu, 21 May 2009 20:17:45 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711781#action_12711781
] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I went through the optimizer code for following query
connect 'jdbc:derby:wombat';
select * from --DERBY-PROPERTIES joinOrder=FIXED 
TENKTUP1 -- DERBY-PROPERTIES index=TK1UNIQUE1 
, TENKTUP2 -- DERBY-PROPERTIES index=TK2UNIQUE1 
where TENKTUP1.unique1 = TENKTUP2.unique1 
order by TENKTUP1.unique1, TENKTUP2.unique1; 


During the optimize phase, for an optimizable, everytime we are considering a new access path
for it, we check if the conglomerate being considered for the optimizable is an index conglomerate.
If yes, then we check if the current ordering for the given join order already has the index
columns in it. That check is done by the following call in FromBaseTable.nextAccessPath: 478

if ( ! rowOrdering.orderedOnColumn(isAscending[i] ? RowOrdering.ASCENDING : RowOrdering.DESCENDING,
	getTableNumber(), baseColumnPositions[i]))
The RowOrderingImpl.orderedOnColumn:119 first checks if the optimizable we are dealing with
is marked as always ordered(I think this happens if the optimizable is a one-row table). If
yes, then the we will return true from RowOrderingImpl.orderedOnColumn. If not, we check if
there was a predicate on the index column which makes it always ordered(line 127). If not,then
we check if the column is already being ordered by checking ordering vector at line 133. If
not, we go ahead and add the index column to the ordering vector in currentRowOrdering object.

Going to the query above when the optimizer is going through the join order [0, -1] which
means it is considering TENKTUP1 at the outermost join order position and no optimizable has
yet been considered in the next position yet. For TENKTUP1, we go through the above piece
of code. The method RowOrderingImpl.orderedOnColumn is going to return false for the index
column UNIQUE1(index TK1UNIQUE1 is on that column) because 
a)TENKTUP1 is not always ordered meaning it is not one-row table
b)there is no constant predicate on TENKTUP1.UNIQUE1
c)there is no other ordering on TENKTUP1.UNIQUE1

Since for our query, the index column UNIQUE1 is not already ordered yet in the currentRowOrdering
object, we go ahead and add it to the ordering vector inside currentRowOrdering object (this
is done in FromBaseTable.nextAccessPath at line: 484 with following 2 code lines.)
	rowOrdering.nextOrderPosition(isAscending[i] ? RowOrdering.ASCENDING : RowOrdering.DESCENDING);
	rowOrdering.addOrderedColumn(isAscending[i] ? RowOrdering.ASCENDING : RowOrdering.DESCENDING,
		getTableNumber(), baseColumnPositions[i]);
This adding of TENKTUP1.UNIQUE1 to ordering object will indicate that the rows are ordered
on that column. So, at the end of considering access path for [0, -1] join order position,
currentRowOrdering object will have TENKTUP1.UNIQUE1 in ordering vector and it will be marked
as the current column ordering (this is done by having currentColumnOrdering = TENKTUP1.UNIQUE1).


Next, we work on finding the cost of the given access path for [0, -1] join order. Once we
find the cost, we check to see if it makes sense to avoid sort on it from what we know so
far. This is done in OptimizerImpl.costBasedCostOptimizable through following piece of code
	if (considerSortAvoidance && requiredRowOrdering.sortRequired(currentRowOrdering,
		assignedTableMap) == RequiredRowOrdering.NOTHING_REQUIRED)
What we are doing here is checking if the ordering requested by the user for the current table
(that information is in requiredRowOrdering) is satisfied by row ordering provided by the
join order being considered so far. The row ordering provided for the current join order is
in currentRowOrdering. In our specific eg, since user has requested for ordering on TENKTUP1.UNIQUE1
and we are ordering on it because of the index that is being used, the above call to requiredRowOrdering.sortRequired
is going to return sorting not required.

Next, we put the next optimizable in the join order, meaning moving from [0,-1] join order
to [0, 1]. At this point, optimizer is considering TENKTUP2 in the 2nd join order position.
We have asked the optimizer to use index TK2UNIQUE1 for optimizable TENKTUP2. We go through
the same code path as above for this optimizable. We will find that the index column TENKTUP2.unique1
needs to be added to ordering vector in currentRowOrdering because TENKTUP2 is not one-row
table and there is no constant predicate on TENKTUP2.unique1 and hence the current index being
considered on TENKTUP2 is going to provide ordering on TENKTUP2.unique1. Next, we see if sort
can be aovided for TENKTUP2 by comparing required row ordering against current row ordering.
User has required ordering on TENKTUP2.unique1 and current row ordering satisfies that ordering
because of the index which is being considered for TENKTUP2.

So, if my understanding of code is correct, the sorting is getting avoided NOT based on the
fact that equality condition exists between the two optimizables ie TENKTUP1.unique1 = TENKTUP2.unique1
In other words, sorting avoidacne decision was not based on TENKTUP1.unique1 = TENKTUP2.unique1
It was because optimizer decides that the individual sorting required on the optimizables
have been satisfied by indexes picked on them.

Hope this helps understand the current behavior of the trunk code for the query in question.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: d3926_repro.sql, derby-reproduce.zip, DERBY3926_notforcheckin_patch1_051109_diff.txt,
DERBY3926_notforcheckin_patch1_051109_stat.txt, DERBY3926_notforcheckin_patch2_051109_diff.txt,
DERBY3926_patch3_051509_diff.txt, DERBY3926_patch3_051509_stat.txt, DERBY3926_patch4_051519_diff.txt,
DERBY3926_patch4_051519_stat.txt, script3.sql, script3WithUserFriendlyIndexNames.sql, test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024).
I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the
result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle
column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024),
PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented
me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring
the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message