Return-Path: Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: (qmail 94476 invoked from network); 26 Oct 2010 15:46:48 -0000 Received: from unknown (HELO mail.apache.org) (140.211.11.3) by 140.211.11.9 with SMTP; 26 Oct 2010 15:46:48 -0000 Received: (qmail 53136 invoked by uid 500); 26 Oct 2010 15:46:46 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 53036 invoked by uid 500); 26 Oct 2010 15:46:45 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 53029 invoked by uid 99); 26 Oct 2010 15:46:44 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Oct 2010 15:46:44 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=10.0 tests=ALL_TRUSTED X-Spam-Check-By: apache.org Received: from [140.211.11.22] (HELO thor.apache.org) (140.211.11.22) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Oct 2010 15:46:43 +0000 Received: from thor (localhost [127.0.0.1]) by thor.apache.org (8.13.8+Sun/8.13.8) with ESMTP id o9QFkNJT016773 for ; Tue, 26 Oct 2010 15:46:23 GMT Message-ID: <14963485.80751288107983496.JavaMail.jira@thor> Date: Tue, 26 Oct 2010 11:46:23 -0400 (EDT) From: "David Smiley (JIRA)" To: dev@lucene.apache.org Subject: [jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes In-Reply-To: <19075058.108021286943512446.JavaMail.jira@thor> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925001#action_12925001 ] David Smiley commented on SOLR-2155: ------------------------------------ bq. Using the canonical geohash gives facet values that can be copy&pasted with other software. Thinking again, this is a great feature. Would it be worth optimizing geohash with a Trie version? Trie fields (can be made to) show up correctly in facets. The geohash usage is purely internal to the implementation; users don't see it when they use this field. And even if they were exposed, they can be generated on-demand. There's even javascript code I've seen to do this. So I'm not married to using geohashes -- it's the underlying heirarchical/gridded nature of them that is key. I'm not sure how a "trie version" of geohash is developed. I already spoke of further refining the implementation to index the geohashes at each grid level and I think that is very similar to what trie does for numbers. Thanks for the suggestion of using OpenStreetMaps to get locations; I'll look into that. I want to put together a useful data set -- using real data as much as possible is good. I'll need to synthesize a one-to-many document to points mapping, randomly, however. And I'll need to come up with various random lat-lon box queries to perform. I'd like to use Lucene's benchmark contrib module as a framework to develop the performance test. I read about it in LIA2 and it seems to fit the bill. > Geospatial search using geohash prefixes > ---------------------------------------- > > Key: SOLR-2155 > URL: https://issues.apache.org/jira/browse/SOLR-2155 > Project: Solr > Issue Type: Improvement > Reporter: David Smiley > Attachments: GeoHashPrefixFilter.patch > > > There currently isn't a solution in Solr for doing geospatial filtering on documents that have a variable number of points. This scenario occurs when there is location extraction (i.e. via a "gazateer") occurring on free text. None, one, or many geospatial locations might be extracted from any given document and users want to limit their search results to those occurring in a user-specified area. > I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash prefix based filter. A geohash refers to a lat-lon box on the earth. Each successive character added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the geohash) grid. The first step in this scheme is figuring out which geohash grid squares cover the user's search query. I've added various extra methods to GeoHashUtils (and added tests) to assist in this purpose. The next step is an actual Lucene Filter, GeoHashPrefixFilter, that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the index. Once a matching geohash grid is found, the points therein are compared against the user's query to see if it matches. I created an abstraction GeoShape extended by subclasses named PointDistance... and CartesianBox.... to support different queried shapes so that the filter need not care about these details. > This work was presented at LuceneRevolution in Boston on October 8th. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org