Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 31539 invoked from network); 28 Jul 2005 01:41:52 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Jul 2005 01:41:52 -0000 Received: (qmail 97162 invoked by uid 500); 28 Jul 2005 01:41:45 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 97137 invoked by uid 500); 28 Jul 2005 01:41:45 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 97123 invoked by uid 99); 28 Jul 2005 01:41:45 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2005 18:41:45 -0700 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: neutral (asf.osuosl.org: local policy) Received: from [204.127.202.56] (HELO sccrmhc12.comcast.net) (204.127.202.56) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 27 Jul 2005 18:41:38 -0700 Received: from domain.com (unknown[69.252.201.233](misconfigured sender)) by comcast.net (sccrmhc12) with ESMTP id <2005072801413601200a0fb2e>; Thu, 28 Jul 2005 01:41:40 +0000 Date: Wed, 27 Jul 2005 19:42:18 -0600 (MDT) From: Barry Carter To: java-user@lucene.apache.org Subject: Lucene vs Derby (vs MySQL) for spatial indexing MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org Message-Id: <20050728014138.9EC4D10FB2B7@asf.osuosl.org> X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Does Lucene optimize range queries that use Sort and/or limit the number of hits? My situation: I have a listing of 2 million cities, with the name, latitude, longitude, and population of each city. I want to efficiently find the 50 most populous cities between (for example) latitudes 35.2 and 41.7 and longitudes 19.8 and 27.9 Assuming I normalize the data to be lexically sorted (in other words, I'll write 7.52 as 007.52, so it comes before 111.01 instead of after it), can I use a range query on the latitude and longitude fields (limiting the number of hits to 50, and sorting by population descending) to efficiently find what I want? If sorting isn't efficient, can I simply boost each record by its population (so that high population cities are returned first) and then limit the number of hits (so I see only the 50 most populous cities in a given area)? I tried this in Derby, the code being: Statement s = DriverManager.getConnection("jdbc:derby:test;create=false").createStatement(); s.setMaxRows(50); rs = s.executeQuery("SELECT * FROM cities where lat>35.2 and lat<41.7 and lon>19.8 and lon<27.9 ORDER BY population desc"); but Derby inefficiently looks at ALL the cities matching my criteria (even with indexes on lat and lon and population) before returning the top 50 (this is really bad when the condition is "lat>-90 and lat<90 and lon>-180 and lon<-180", for example). The MySQL equivalent ("SELECT * FROM cities where lat>35.2 and lat<41.7 and lon>19.8 and lon<27.9 ORDER BY population desc LIMIT 50") with the same indexes is more efficient (it uses the LIMIT condition to optimize the query), and using MySQL w/ spatial indexes is even more efficient. However, I'm doing this as part of a Java application, so need something that can be embedded in Java. Is this a reasonable use of Lucene? Or is coercing Lucene into doing range-based numeric queries a bad idea? (In case anyone's interested, I'm writing a zoomable/pannable world map, so finding the biggest cities in a given area quickly is important) --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org