db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rick Hillegas (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-642) SELECT MAX doesn't use indices optimally
Date Tue, 08 Mar 2011 20:43:59 GMT

     [ https://issues.apache.org/jira/browse/DERBY-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Rick Hillegas updated DERBY-642:
--------------------------------

    Attachment: max2.sql
                max.sql
                Timestamper.java

Attaching Timestamper.java, max.sql, and max2.sql. Together these form a test which verifies
the value of this improvement. Timestamper is a Derby function which reports the number of
seconds since it was called last. The max.sql script creates a table which has 3M rows and
then indexes it. One out of every 100 rows contains a null. The max2.sql script selects the
max value from the table and reports how long the aggregate took. When run on Derby 10.7.1,
the aggregate takes 10 seconds. When run on 10.8.1, the aggregate returns the correct value
immediately. Note that there is some instability in Derby which affects the max.sql script.
I have noticed that sometimes the index creation dies on an OOM error. This happens as far
back as 10.5.3.

> SELECT MAX doesn't use indices optimally
> ----------------------------------------
>
>                 Key: DERBY-642
>                 URL: https://issues.apache.org/jira/browse/DERBY-642
>             Project: Derby
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions: 10.2.1.6
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>            Priority: Minor
>             Fix For: 10.8.0.0
>
>         Attachments: Timestamper.java, derby-642-1a.diff, derby-642-1b-withTests.diff,
derby-642-2a-test-serializable.diff, derby-642-3a-waitBeforeRetry.diff, max.sql, max2.sql
>
>
> I tried running SELECT MAX on an indexed column in a big (8 GB)
> table. It took 12 minutes, which is about 12 minutes more than I
> expected.
> After a bit of investigation, I found out that a full index scan was
> performed because all the rows referenced from the rightmost B-tree
> node were actually deleted.
> Possible improvements:
>   1. Implement backwards scan in the B-trees (this is also suggested
>      in the comments in BTreeMaxScan).
>   2. Go up to the parent node and down to the next leaf node on the
>      left side, and continue until a valid max value is found. In
>      Derby, traversing up in a B-tree is not allowed, but would it be
>      ok to go up if the latches were kept on the higher-level nodes in
>      the tree? Of course, this would have negative impact on
>      concurrency.
>   3. Right-to-left traversal on the leaf level is possible (using
>      ControlRow.getLeftSibling()), but will throw an exception if the
>      page cannot be latched without waiting. It is therefore possible
>      to try a right-to-left search for the max value, and only fall
>      back to the left-to-right approach if a conflict arises.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Mime
View raw message