db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Knut Anders Hatlen (JIRA)" <j...@apache.org>
Subject [jira] Updated: (DERBY-642) SELECT MAX doesn't use indices optimally
Date Wed, 09 Feb 2011 18:11:57 GMT

     [ https://issues.apache.org/jira/browse/DERBY-642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel

Knut Anders Hatlen updated DERBY-642:

    Attachment: derby-642-1a.diff

Here's a patch that attempts to solve this issue by scanning the
B-tree backwards to find the maximum value, along the lines sketched
out in DERBY-884. The patch is not ready for commit yet since it
contains no tests and some comments in the BTreeMaxScan still describe
the old behaviour. There are also some parts that need a little more
investigation, see more below.

The patch passes suites.All and derbyall, and I have been able to
exercise most of the new code in manual tests and it seems to be
well-behaved. Some of the code paths are very difficult to test
reliably, though, since they're so timing dependent.

The patch changes the code the following way:

* BTreeScan.java:

- Added a new method positionAtPreviousPage(), which is similar to
positionAtNextPage() except that it moves in the opposite direction
and that it throws a WaitError instead of waiting for a latch in case
of a conflict.

Another difference is that positionAtPreviousPage() skips empty leaf
pages. This is to avoid the problem with saving the current position
when there is no key to save on the current page. When scanning in the
forward direction, we only save the position when we have to wait for
a row lock or we have filled the internal fetch buffer. In both cases
we are positioned on a row when we save the position, and then we just
need to store the key on the current row in order to save the

When scanning in the backward direction, however, we may also have to
save the position when moving from one leaf page to its sibling page,
if the sibling page cannot be latched immediately. Leaf pages can be
empty, and then there is no key on the page that we can store and use
for repositioning later. So in order to be sure that we have a
position to save in the case where we have to wait for a latch,
positionAtPreviousPage() holds the latch on the current page until it
finds the first non-empty leaf page on its left side.


- Removed the method fetchMaxRowFromBeginning(), since the code now
always fetches the max row from the end.

- Made fetchMax() continue on the previous page instead of giving up
and starting from the beginning in case no qualifying row is found on
the last page.

- Made fetchMax() reposition using the saved position in case of a
lock conflict. Previously, it would have started a full scan from the
beginning of the index in that case.

- Added new method moveToLeftSibling(), used by fetchMax() to move to
the logically next page. This method saves the current position if
positionAtPreviousPage() throws a WaitError so that fetchMax() can
release all latches and reposition on the exact same spot.

Some things that differ between repositioning in the forward direction
and in the backward direction:

a) As mentioned above, we may need to save the position when moving
from one page to another, and saving the position by key value isn't
possible if the current page is empty. positionAtPreviousPage() skips
empty pages so that we won't have to save the position on an empty
page mid-scan. However, the rightmost leaf page may be empty, in which
case the scan starts on an empty page. If we need to wait for a latch
when moving from the rightmost leaf and that leaf is empty, we don't
attempt to save the position. Instead of repositioning the scan, we
will restart it completely, which should be a safe thing to do since
we haven't seen any rows yet.

b) After repositioning, we should be positioned on the exact same row
as where we were when the position was saved. However, if the row has
been purged in the meantime, reposition() will place us immediately to
the left of where the row should have been. This is fine for forward
scans, but not for backward scans. Backward scans should be positioned
immediately to the right of that position (an issue also raised by
Mike in DERBY-884). I didn't find any existing code to position us on
the right side of the purged row (there's code to do that for partial
key scans, but repositioning uses full keys). So in the case where
reposition() can't position on the exact same key as the one we saved,
the patch now adds one to the current slot number, which moves the
position one slot to the right.

Some issues that could need improvement or more investigation:

- When a latch conflict is detected, the position is saved, all
latches released, and the operation is retried immediately. If the
other thread hasn't released the latch yet, we'll run into the same
latch conflict again, and we may loop a couple of times, saving the
position and reposititioning each time, before we can successfully
latch the previous page. If we want to avoid this active waiting, we
could refine this approach by doing an ordinary getPage() to latch the
sibling page after we have released the latch on the current page, and
wait until the page can be latched (and unlatched again) before we
reposition. Since we don't hold the latch on the current page while
waiting for the latch on the previous page, waiting here shouldn't
cause any deadlocks.

- The current implementation of fetchMax(), also before the patch,
doesn't call unlockScanRecordAfterRead() to release read locks on
unqualified records once we've moved past them. Other scans do that,
and we should probably do that in max scans as well, but the current
patch doesn't change this.

- positionAtPreviousPage() was placed in the BTreeScan class since
that's where positionAtNextPage() was. Probably these methods would be
better placed in sub-classes for backward scan and forward scan. I
think the reason why positionAtNextPage() was in that class, was that
BTreeMaxScan used it when doing a max scan from the beginning of the
index. Since the patch removed that code, the methods could be moved
now. But I suppose that could rather be done as a cleanup after on.

- Testing. We need tests that exercise the code paths that handle the
latch conflicts. Some of these are very hard to test, especially the
cases where the row on the current position has been purged in the
small window between the saving of the current position and the
repositioning. I have been able to exercise that code path by using a
debugger to coordinate a forward scanner, a backward scanner and the
B-tree post commit thread so that the row disappeared at the exact
right moment to make the backward scanner take that path. I have only
been able to trigger this path for repositioning after a latch
conflict, though.

To exercise the corresponding path for lock conflicts, I think the
backward scanner needs to be timed so that it ends up waiting for a
lock held by the B-tree post commit thread. That's the only way I can
see how a row can have been purged when the scanner wakes up after
having been granted the lock on the row, but perhaps someone sees
other ways this could happen? Apart from the obvious timing problem of
getting the scanner queued up right behind the post commit thread, I
think the window where the post commit thread holds locks without
holding the latch on the page it's working on, is extremely small. The
backward scanner would have to access the page in this small window,
otherwise it would be blocked by the latch and not by the lock.

Next, I will write some JUnit tests that exercise the new code, and
also add some tracing so that we can verify that the code actually has
been exercised.

Feedback about the suggested approach would be greatly appreciated.

> SELECT MAX doesn't use indices optimally
> ----------------------------------------
>                 Key: DERBY-642
>                 URL: https://issues.apache.org/jira/browse/DERBY-642
>             Project: Derby
>          Issue Type: Improvement
>          Components: Store
>    Affects Versions:
>            Reporter: Knut Anders Hatlen
>            Assignee: Knut Anders Hatlen
>            Priority: Minor
>         Attachments: derby-642-1a.diff
> I tried running SELECT MAX on an indexed column in a big (8 GB)
> table. It took 12 minutes, which is about 12 minutes more than I
> expected.
> After a bit of investigation, I found out that a full index scan was
> performed because all the rows referenced from the rightmost B-tree
> node were actually deleted.
> Possible improvements:
>   1. Implement backwards scan in the B-trees (this is also suggested
>      in the comments in BTreeMaxScan).
>   2. Go up to the parent node and down to the next leaf node on the
>      left side, and continue until a valid max value is found. In
>      Derby, traversing up in a B-tree is not allowed, but would it be
>      ok to go up if the latches were kept on the higher-level nodes in
>      the tree? Of course, this would have negative impact on
>      concurrency.
>   3. Right-to-left traversal on the leaf level is possible (using
>      ControlRow.getLeftSibling()), but will throw an exception if the
>      page cannot be latched without waiting. It is therefore possible
>      to try a right-to-left search for the max value, and only fall
>      back to the left-to-right approach if a conflict arises.

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira


View raw message