db-derby-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: Does Derby optimize queries w/ ORDER BY and "LIMIT"? (w/ indexes?)
Date Tue, 05 Jul 2005 17:43:22 GMT
I'll start by saying that I am not an optimizer expert.  I don't think
the optimizer does anything special with limits - but I could be wrong,
maybe someone else with more knowledge in that area can let us know.

Having said that, the optimizer definitely will try to eliminate sorts
necessary by order by if there are existing indexes which it can use
to get the necessary order - in the case below it needs a descending
index on the population column in cities.  Do you have such an index?

I think without such an index there is no choice in derby but to find
all 50,000
rows so that it can determine the 50 most populous, which meet your
criteria. The Derby optimizer does not consider creating intermediate
sort nodes in the query plan.  Also the sorter is not optimized for
limited result sets, it
tends to randomly create merge buckets and then sort those merge buckets
and finally merge all the merge buckets - that is about the opposite
of what you want for a limited result.  You rather have buckets for
ranges and then once you had 50 more than a given value you could throw
away all those bigger/less than the value.

The limit function as you say I believe mostly just cuts off the rows,
it doesn't affect the processing much.  When possible derby trys to
stream rows back to the user rather than processing them all before
returning the first row - but an order by without a supporting index is
a case where all the results of the query will be determined and then
as the final step they are thrown into the sorter before returning
results to the caller.  Derby query execution does not add intermediate
sort nodes.

Stan wrote:

> Does Derby optimize queries that use "ORDER BY" and "setMaxRows()" when
> indexes are present?
> 
> My situation: I have an existing Derby table listing 2 million cities,
> with the name, latitude, longitude, and population of each city. I want to
> efficiently find the 50 most populous cities between (for example)
> latitudes 35.2 and 41.7 and longitudes 19.8 and 27.9. The query is simple:
> 
> Statement s =
> DriverManager.getConnection("jdbc:derby:test;create=false").createStatement();
>             
> s.setMaxRows(50);
>             
> rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and 
> lon>19.8 and lon<27.9 ORDER BY population desc");
> 
> There are indexes on lat, lon, and population, but the query seems to take
> a long time. In particular, setting "s.setMaxRows(50)" doesn't seem to
> speed things up at all. It looks like Derby finds ALL the cities in the
> specified latitude/longitude range (over 50,000 of them), and that
> setMaxRows() just limits how many rows it shows me, not how many rows it
> computes.
> 
> I know that MySQL optimizes queries with LIMITs -- does Derby do the same?
> 
> At the risk of offending, is there a better way to do what I want (by
> creating my own data structures for example) without necessarily using
> Derby?
> 
> (In case anyone's interested, I'm trying to update a map, so finding the
> biggest cities in a given area quickly is important)
> 
> 

Mime
View raw message