db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Geoff hendrey <geoff_hend...@yahoo.com>
Subject Re: The result offset and fetch first clauses
Date Mon, 04 May 2009 14:42:31 GMT
Thanks for the feedback.

I guess it *is* too soon for me to throw away my code :-)

I do think it is important for Derby to optimize the OFFSET/NEXT implementation. "Paging"
is inherently something you do when you have large result sets. If OFFSET has to read process
all of the result set up to the offset, then it really defeats the purpose and doesn't seem
like anything more than a syntactic shortcut.

Could we discuss this further on this list, and decide if an issue should be opened for optimization?

 -geoff
“XML? Too much like HTML. It'll never work on the Web!” 
-anonymous 





________________________________
From: Dag H. Wanvik <Dag.Wanvik@Sun.COM>
To: Derby Discussion <derby-user@db.apache.org>
Sent: Monday, May 4, 2009 5:19:15 AM
Subject: Re: The result offset and fetch first clauses

Geoff hendrey <geoff_hendrey@yahoo.com> writes:


> 1) can I use this feature even if I am ordering on multiple columns?

As Tiago said, the feature works on all result sets, including
updatable ones.

>
> 2) what are the performance implications for users of the embedded
> driver? In particular, with the embedded driver I am hoping that
> this feature allows portions of a result set to be retrieved without
> the overhead of retrieving the entire result set. For example, if I

I am afraid that with embedded driver, you will only save a little CPU
(by avoiding some JDBC calls) since under the hood, the code siphons
off the rows till it hits the offset, so if you have a large offset,
you will still incur reading of those rows (modulo page caching). In
client/server driver context the savings are larger, of course, in
that fewer rows are sent over the wire. For simple queries that can
use an index, the optimizer could make use of the offset information
to avoid reading the entire row when skipping rows before offset, just
counting rows in the index to get to the first qualifying row, but
this optimization is not yet implemented.

Often, this feature is used together with ORDER BY which would entail
some sorting of the result set and then all the rows would have to be
read anyway. Again, for some simple queries, sort avoidance is used by
the optimizer, so optimization is still possible for for such queries.

If you think this optimization is an important capability feel free to
file an improvement issue for it.

Any rows after the FETCH n rows are not processed for simple
queries. For complex queries they might be read to perform sorting.

> If (2) is not efficient, how does it compare to the efficiency of the following approach:
>
> Get the result set. Use a loop to increment integer n by PAGE_SIZE,
> and inside the loop use ResultSet.absolute(n) combined with
> stmt.setFetchSize(1) to retrieve a "marker" row that signifies the
> begining of each "page" of the result set. I use the primary keys of
> these "markers" as page boundaries so that my web application can
> provide links to a set of pages evenly distributes throughout the
> result set.

This sounds like a resonable approach to me, when any concurrent
insert/deletes concerns are addressed, or are irrelevant.

Dag

Mime
View raw message