lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Otis Gospodnetic <otis_gospodne...@yahoo.com>
Subject Re: Lucene vs Derby (vs MySQL) for spatial indexing
Date Thu, 28 Jul 2005 03:50:51 GMT
Barry,

You may also want to consider PostgreSQL for a few reasons: 1) it's
historically known to work well for geo-spatial data, 2) has
GIS/geo-spatial data types and such, and 3) it seems that the new
versions let you embed Java directly into the database (perhaps
something like Oracle's Java-embedding thing).

Otis


--- Barry Carter <barry.carter@bigfoot.com> wrote:

> Does Lucene optimize range queries that use Sort and/or limit the
> number 
> of hits?
> 
> My situation: I have a listing of 2 million cities, with the name,
> latitude, longitude, and population of each city. I want to
> efficiently
> find the 50 most populous cities between (for example) latitudes 35.2
> and
> 41.7 and longitudes 19.8 and 27.9
> 
> Assuming I normalize the data to be lexically sorted (in other words,
> I'll
> write 7.52 as 007.52, so it comes before 111.01 instead of after it),
> can
> I use a range query on the latitude and longitude fields (limiting
> the
> number of hits to 50, and sorting by population descending) to
> efficiently
> find what I want?
> 
> If sorting isn't efficient, can I simply boost each record by its
> population (so that high population cities are returned first) and
> then
> limit the number of hits (so I see only the 50 most populous cities
> in a
> given area)?
> 
> I tried this in Derby, the code being:
> 
> Statement s =
>
DriverManager.getConnection("jdbc:derby:test;create=false").createStatement();
> s.setMaxRows(50);
> rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7
> and lon>19.8 and lon<27.9 ORDER BY population desc");
> 
> but Derby inefficiently looks at ALL the cities matching my criteria
> (even
> with indexes on lat and lon and population) before returning the top
> 50
> (this is really bad when the condition is "lat>-90 and lat<90 and
> lon>-180
> and lon<-180", for example).
> 
> The MySQL equivalent ("SELECT * FROM cities where lat>35.2 and
> lat<41.7
> and lon>19.8 and lon<27.9 ORDER BY population desc LIMIT 50") with
> the
> same indexes is more efficient (it uses the LIMIT condition to
> optimize
> the query), and using MySQL w/ spatial indexes is even more
> efficient.  
> However, I'm doing this as part of a Java application, so need
> something
> that can be embedded in Java.
> 
> Is this a reasonable use of Lucene? Or is coercing Lucene into doing
> range-based numeric queries a bad idea?
> 
> (In case anyone's interested, I'm writing a zoomable/pannable world
> map,
> so finding the biggest cities in a given area quickly is important)
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message