db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mamta A. Satoor (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3926) Incorrect ORDER BY caused by index
Date Mon, 04 May 2009 16:56:32 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12705646#action_12705646
] 

Mamta A. Satoor commented on DERBY-3926:
----------------------------------------

I went through the optimize phase through the debugger and it appears to me (I may be wrong
and would appreciate others looking at my detail analysis of the optimize phase below) that
the problem may be with the generate phase or the execute phase where we may be not using
the non-unique index on table2 correctly to fetch the orders in row.

The query in question is as below
SELECT table1.id, table2.value, table3.value FROM --DERBY-PROPERTIES joinOrder=FIXED
table3 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table3
, table2 -- DERBY-PROPERTIES index=nonUniqueOnValue_Table2
, table1
WHERE table1.id=table2.id AND table2.name='PageSequenceId' 
AND table1.id=table3.id 
AND table3.name='PostComponentId' 
AND table3.value='21857' ORDER BY table2.value;

For the query above, in addition to the predicates supplied by the user, optimizer internally
generates another predicate, namely, table3.id=table2.id
So for the queyr, all the predicates are as follows
1)table1.id=table2.id 
2)table1.id=table3.id 
3)table3.id=table2.id
4)table2.name='PageSequenceId' 
5)table3.name='PostComponentId' 
6)table3.value='21857' 

Of the predicates above, 4), 5) and 6) can be pushed down to the corresponding optimizables
ie 4) will be associated with table2 and 5),6) will be associated with table3. This is because
these predicates are constant comparison with columns. This leaves us with 3 predicates, namely
1), 2), 3)
which are multitable join predicates.

Optimizer has a class called RowOrdering associated with it (OptimizerImpl.currentRowOrdering).

currentRowOrdering has following fields in it.
currentRowOrdering	RowOrderingImpl  
	alwaysOrderedOptimizables	Vector<E>
	columnsAlwaysOrdered	ColumnOrdering
	currentColumnOrdering	null	
	ordering	Vector<E>  
	unorderedOptimizables	Vector<E>  
All the predicates that are constant comparison will go into columnsAlwaysOrdered. These pushing
of constant comparison predicates happen per optimizable basis when that particular optimizable
is being consdiered in the possible join order combination.

In our specific query, through optimizer overrides, we have instructed optimizer to only consider
join order [table3, table2, table1]. The optimizer starts with [table3, -1, -1]. First thing
it does is it goes through the join predicates (which are 1), 2) and 3) in the predicate list
above). But since
all the referenced tables for any of the 3 predicates are not covered by the current join
order of [table3, -1, -1], nothing gets done to those join predicates. Next, the optimizer
will tell 
currentRowOrdering to (this happens in FromBaseTable(FromTable).tellRowOrderingAboutConstantColumns(RowOrdering,
OptimizablePredicateList) line: 1477) to add predicates 5) and 6) from above list into it's
columnsAlwaysOrdered list. So, at the end of the
[table3, -1, -1], currentRowOrdering.columnsAlwaysOrdered will look as follows 
Direction: 3 Table 0, Column 3 Table 0, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value)
and column 2(name) which are always ordered because they are being compared with constants.
So far, the logic for currentRowOrdering seems to be working fine. Next, we have asked the
optimizer to use the index index=nonUniqueOnValue_Table3 on Table3. This index covers the
predicate 6) since that predicate is on the same column on which the index is created but
it does not cover the other columns from table3 that are being referenced in this query (which
table3.id and table3.name). Because of this, we determine that the index being considered
is not a covering index. The code to determine whether the sorting can be avoided for [table3,
-1, -1], is in OrderByList.sortRequired(RowOrdering, JBitSet) method. Since order by is on
table2.value, the order by column's table does not match with table3 and hecne we determine
that sorting is not required based on what optimizer has seen so far. So it appears that we
leave it to table2 when its turn comes in the join order to decide whether sorting should
indeed be avoided or not.

Next we consider the join order [table3, table2, -1]. For table2, we have asked the optimizer
to use index=nonUniqueOnValue_Table2. First thing that we do is go through the join predicates
1), 2) and 3). Predicate number 3) which is TABLE3.ID = TABLE2.ID can be pushed down to optimizable
table2 because the current join order [table3, table2, -1] includes the tables referenced
by predicate 3). So, at this point, there are 2 predicates pushed down to table3, they are
number 5) and 6). And for table2, there are 2 prdicates pushed down to it, they are number
3) and 4). Also, since predicate 4) is a constant comparison, it will get added to currentRowOrdering.
At this point, currentRowOrdering.columnsAlwaysOrdered will look as folows
Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
We are saying above that Table at position 0 (which is Table3 in our eg) has column 3(value)
and column 2(name) which are always ordered because they are being compared with constants.
In addition, Table at position 1(which is Table2 in our join order) has column 2 which is
always ordered because it is being compared with constant. Next, we have asked the optimizer
to use the index nonUniqueOnValue_Table2 but it does not cover the constant comparison predicate
4) since that predicate is on column name and not value. Notice, this is a different code
path we are following for table2 compared to table3 above. Because table3.value is not already
an ordered column in currentRowOrdering because there is no
constant comparison predicate on it, we add it to the "ordering " vector in currentOrdering
object. This is the first object that gets added to the currentRowOrdering."ordering" vector
in our eg. So, at this point, the currentRowOrdering has only 3 of it's fields propulated
and they are as follows
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 1, Column 3
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3]
The index nonUniqueOnValue_Table2 does not cover any predicate on Table2 and it does not cover
all the column from table2 that are being referenced in this query and hence it is not a covering
index. Next, the code to determine whether sort can be avoided for join order [table3, table2,
-1], we go through the code path in OrderByList.sortRequired(RowOrdering, JBitSet) method.
We find that the order by column's table matches with table2 in join order. Because of this
match, we need to look at currentRowOrdering to see if it will take care of the sorting and
if so we can avoid the sort. To look into currentRowOrdering, we first call currentRowOrdering.alwaysOrdered(cr.getTableNumber())
(in this call, cr is the order by column). So, we are checking if table2 is always ordered
in currentRowOrdering. Since table2 is not always ordered in this query, this check returns
false. Next, we check if not the entire table, is the order by table.order by column combination
always ordered in currentRowOrdering. In our query, that will be table2.value Since there
is no constant comparison predicate on table2.value, it is not going to be in columnsAlwaysOrdered
vector in currentRowOrdering.
For reference, currentRowOrdering looks as follws
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 1, Column 3
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3]
As we can see from currentRowOrdering object above, columnsAlwaysOrdered does not include
Table 1, Column 3. So, we have not found table2 to be always ordered and we have not found
table2.value to be always ordered either. The last place to check is the ordering vector in
columnsAlwaysOrdered. This vector does include Table 1, Column 3 which is table2.value and
hence we determine that sorting is not needed to table2. All this code of checking the columnsAlwaysOrdered
happens in OrderByList.sortRequired(RowOrdering, JBitSet). Assuming that this code is working
as intended, I think then the culprit might be when we generate the code. The only step in
optimize left is to add table1 to the join order. So, at the end of the optimize phase, the
join order will look as follows [table2, table2, table1] and currentRowOrdering looks as follows
columnsAlwaysOrdered	ColumnOrdering
	Direction: 3 Table 0, Column 3 Table 0, Column 2 Table 1, Column 2
currentColumnOrdering	ColumnOrdering 
	Direction: 1 Table 2, Column 1
ordering	Vector<E>  
	[Direction: 1 Table 1, Column 3, Direction: 1 Table 2, Column 1]

The only change to currentRowOrdering that is caused by adding of table1 in third join order
position is that we are going to use primary key on table1 and hence we need to reflect that
in currentRowOrdering by adding it to the ordering vector.

> Incorrect ORDER BY caused by index
> ----------------------------------
>
>                 Key: DERBY-3926
>                 URL: https://issues.apache.org/jira/browse/DERBY-3926
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.1.3.3, 10.2.3.0, 10.3.3.1, 10.4.2.0
>            Reporter: Tars Joris
>            Assignee: Mamta A. Satoor
>         Attachments: derby-reproduce.zip, script3.sql, script3WithUserFriendlyIndexNames.sql,
test-script.zip
>
>
> I think I found a bug in Derby that is triggered by an index on a large column: VARCHAR(1024).
I know it  is generally not a good idea to have an index on such a large column.
> I have a table (table2) with a column "value", my query orders on this column but the
result is not sorted. It is sorted if I remove the index on that column.
> The output of the attached script is as follows (results should be ordered on the middle
column):
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> 4294967297          |000001       |21857
> While I would expect:
> ID                  |VALUE        |VALUE
> ----------------------------------------------
> 4294967297          |000001       |21857
> 2147483653          |000002       |21857
> 2147483654          |000003       |21857
> This is the definition:
> CREATE TABLE table1 (id BIGINT NOT NULL, PRIMARY KEY(id));
> CREATE INDEX key1 ON table1(id);
> CREATE TABLE table2 (id BIGINT NOT NULL, name VARCHAR(40) NOT NULL, value VARCHAR(1024),
PRIMARY KEY(id, name));
> CREATE UNIQUE INDEX key2 ON table2(id, name);
> CREATE INDEX key3 ON table2(value);
> This is the query:
> SELECT table1.id, m0.value, m1.value
> FROM table1, table2 m0, table2 m1
> WHERE table1.id=m0.id
> AND m0.name='PageSequenceId'
> AND table1.id=m1.id
> AND m1.name='PostComponentId'
> AND m1.value='21857'
> ORDER BY m0.value;
> The bug can be reproduced by just executing the attached script with the ij-tool.
> Note that the result of the query becomes correct when enough data is changed. This prevented
me from creating a smaller example.
> See the attached file "derby-reproduce.zip" for sysinfo, derby.log and script.sql.
> Michael Segel pointed out:
> "It looks like its hitting the index ordering on id,name from table 2 and is ignoring
the order by clause."

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message