db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knut Anders Hatlen <Knut.Hat...@Sun.COM>
Subject Re: The result offset and fetch first clauses
Date Wed, 06 May 2009 22:16:02 GMT
Alan Burlison <Alan.Burlison@Sun.COM> writes:

> Dag H. Wanvik wrote:
>
>> I am afraid that with embedded driver, you will only save a little CPU
>> (by avoiding some JDBC calls) since under the hood, the code siphons
>> off the rows till it hits the offset, so if you have a large offset,
>> you will still incur reading of those rows (modulo page caching). In
>> client/server driver context the savings are larger, of course, in
>> that fewer rows are sent over the wire. For simple queries that can
>> use an index, the optimizer could make use of the offset information
>> to avoid reading the entire row when skipping rows before offset, just
>> counting rows in the index to get to the first qualifying row, but
>> this optimization is not yet implemented.
>
> My understanding is that for queries that use >=, =< etc Derby can
> already use an index scan if the column being compared has an index on
> it - in my case it does.  So by switching to RESULT/OFFSET I'd lose
> that benefit, correct?

Yes, with >= an index scan can go directly to the first row
requested. With an offset clause the index scan would need to start at
the beginning of the index and work its way forward, counting rows. This
is because the index scans don't have an interface that allows you to
say "position on the Nth row". One part of the problem is that the index
may contain deleted rows, newly inserted rows that have not yet been
committed, and deleted rows that have not been committed, so for OFFSET
to work correctly, the scan needs to not count the deleted rows, and
possibly wait for uncommitted changes to be committed depending on the
isolation level.

Another aspect is that using >= on the index is probably more likely to
yield the correct results in the scenario you're describing, as OFFSET
doesn't guarantee that you start right after the last result on the
previous page if the table has been modified (inserts/deletes) in
between.

>> Often, this feature is used together with ORDER BY which would entail
>> some sorting of the result set and then all the rows would have to be
>> read anyway. Again, for some simple queries, sort avoidance is used by
>> the optimizer, so optimization is still possible for for such queries.
>
> What if the ORDER BY clause only uses indexed columns?  Presumably
> Derby can just return the rows in index order in that case, and no
> sort is required?

Yes, the optimizer will try to avoid sorts if possible, also if the
query contains an OFFSET clause.

Note that Derby doesn't currently walk indexes backwards, so a sort is
only avoided if the index is in the same order as the requested ordering
of the result. CREATE INDEX allows you to create both ascending and
descending indexes.
http://db.apache.org/derby/docs/10.5/ref/rrefsqlj20937.html

-- 
Knut Anders

Mime
View raw message