db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "A B (JIRA)" <j...@apache.org>
Subject [jira] Commented: (DERBY-3279) Derby 10.3.X sorts VARCHAR column incorrectly when table is indexed and VARCHAR column contains a value of '00000'
Date Mon, 17 Dec 2007 00:53:43 GMT

    [ https://issues.apache.org/jira/browse/DERBY-3279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12552306
] 

A B commented on DERBY-3279:
----------------------------

In the absence of IN list "multi-probing", the way Derby implements an IN list is:

  while (more rows in the underlying result set)
  {
      get next row from the result set
      search IN list values to see if target column of row exists in the IN list
      if target column of the row exists in the IN list, return the row
  }

Since the loop is based on scanning the underlying result set for rows, any rows which we
return will be returned in the order they are received from the underlying result set.  So
if, for example, the underlying result is an index scan, and if the index is defined as DESCENDING,
the rows will come back in descending order.

With multi-probing enabled, which only happens after DERBY-47 and only happens for index scans
(which is why the ordering is correct if the index is removed, per Ajay's description), the
processing is (loosely):

  while (more values in the IN list)
  {
      get next IN list value
      probe underlying index for a row whose target column matches the current IN list value
      if index probe returned a row, return that row
  }

Notice how, here, the loop is driven by the order of the values in the *IN list*, instead
of by the order of the rows as they are returned from the index scan.  So even if the index
is defined as "DESC", the final ordering of the rows will match the ordering of the IN values.

That said, in order to ensure proper handling of duplicate IN list values, the current multi-probe
logic in Derby always sorts IN list values in ASCENDING order.  So unless an explicit SortResultSet
is generated for execution time, the rows for an IN-list multi-probe will always be in ASCENDING
order.

The final relevant piece of information is that the optimizer is smart enough to remove unnecessary
sorts when a) there is an ORDER BY on some index column IC, b) the final access plan uses
that index, and c) the ASC/DESC option of the ORDER BY matches the ASC/DESC option of the
index column IC.  For the example queries, we have "ORDER BY CHEESE_CODE DESC", and we have
an index definition which includes "CHEESE_CODE DESC".  So the optimizer thinks the sort is
unnecessary and removes it.  Hence this Jira.

There are two fixes that come to mind right away:

  1. Make the optimizer recognize that the sort cannot be removed if we're multi-probing,
or
  2. Make MultiProbeTableScanResultSet sort the IN list values in ascending OR descending
order depending on the ORDER BY clause (instead of always sorting in ASCENDING).

I think option #2 is the best option here.  I have a fix in progress and will post when it
is postable...

> Derby 10.3.X sorts VARCHAR column incorrectly when table is indexed and VARCHAR column
contains a value of '00000'
> ------------------------------------------------------------------------------------------------------------------
>
>                 Key: DERBY-3279
>                 URL: https://issues.apache.org/jira/browse/DERBY-3279
>             Project: Derby
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 10.3.1.4, 10.3.2.1
>         Environment: Rational Application Developer 7.0.0.2 (Eclipse 3.2.2), J2RE 1.5.0
IBM J9 2.3 Windows XP
>            Reporter: Ajay Bhala
>
> Running the following produces the error seen in Derby 10.3.X but not in 10.2.X nor in
10.1.X.
> Don't know if this related to DERBY-3231.
> First query is incorrectly sorted whereas the second one is okay when there is an index
on the table. 
> If the table is not indexed, the sort works correctly in DESC order.
> ------
> create table CHEESE (
>   CHEESE_CODE       VARCHAR(5),
>   CHEESE_NAME       VARCHAR(20),
>   CHEESE_COST       DECIMAL(7,4)
> );
> create index cheese_index on CHEESE (CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST
DESC);
> INSERT INTO CHEESE (
>   CHEESE_CODE,
>   CHEESE_NAME,
>   CHEESE_COST)
> VALUES ('00000', 'GOUDA', 001.1234),
>        ('00000', 'EDAM', 002.1111),
>        ('54321', 'EDAM', 008.5646),
>        ('12345', 'GORGONZOLA', 888.2309),
>        ('AAAAA', 'EDAM', 999.8888),
>        ('54321', 'MUENSTER', 077.9545);
> SELECT * FROM CHEESE 
> WHERE (CHEESE_CODE='00000' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
> ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;
> SELECT * FROM CHEESE 
> WHERE (CHEESE_CODE='AAAAA' OR CHEESE_CODE='54321') AND CHEESE_NAME='EDAM'
> ORDER BY CHEESE_CODE DESC, CHEESE_NAME DESC, CHEESE_COST DESC;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message