lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Bell (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-2155) Geospatial search using geohash prefixes
Date Sat, 12 Feb 2011 02:23:57 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12993802#comment-12993802
] 

Bill Bell commented on SOLR-2155:
---------------------------------

David,

THis seems to be pretty fast since the results are constrained by d=<km> first, and
then finding the closest points by distance from pt. It is at least as fast at geodist().
geodist() uses the same algorithm and if you were to duplicate the lat,long in separate rows,
you would be searching on the same number of fields. The one area we could improve performance
would be in the split() regex call. We could put them into separate fields to speed that up,
but I am not an expert on the API to get dynamic fields. For example: <dynamicField name="storemv_*"
 type="string"    indexed="true"  stored="true"/>. My question is: "what is the API call
to get the fields stored for a document beginning with "storemv_" ?  If we do that we can
use a copy field for lat,long values.

I copied the Haversine function that Grant added in ./java/org/apache/solr/search/function/distance/HaversineConstFunction.java,
since I felt geodist() and geomultidist() could use the same distance calculation since it
is named the same. But you are right we should just convert both programs to use the DistanceUtils
class.

I cannot see how we can get accurate distances using boxes (but you know more about geohash
then I do), it would only be an approximation. The boxes work great for filtering. Then we
need something to calculate the distance from pt to the value in the index. If you want to
approximate the distance then boxes would work, but you kinda have that with the filter right?
The use case that I am trying to solve is: Millions of locations. But the user only selects
d=10,20,50, or 100 and these results are smaller than the overall population of points. Sort
then by distances.

There is a use case that says show me the top 100 closest documents, and I don't care about
the exact order. You solved that already with the filter.

I would vote for making geomultidist() work faster, but I need accurate distances. This code
is pretty good, we can create a few test cases, and submit to be included since it works with
LatLon and geohash...  For LatLon this is pretty the best it gets.

Bill






> Geospatial search using geohash prefixes
> ----------------------------------------
>
>                 Key: SOLR-2155
>                 URL: https://issues.apache.org/jira/browse/SOLR-2155
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: David Smiley
>         Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
SOLR.2155.p2.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on documents
that have a variable number of points.  This scenario occurs when there is location extraction
(i.e. via a "gazateer") occurring on free text.  None, one, or many geospatial locations might
be extracted from any given document and users want to limit their search results to those
occurring in a user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash
prefix based filter.  A geohash refers to a lat-lon box on the earth.  Each successive character
added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the
geohash) grid.  The first step in this scheme is figuring out which geohash grid squares cover
the user's search query.  I've added various extra methods to GeoHashUtils (and added tests)
to assist in this purpose.  The next step is an actual Lucene Filter, GeoHashPrefixFilter,
that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the
index.  Once a matching geohash grid is found, the points therein are compared against the
user's query to see if it matches.  I created an abstraction GeoShape extended by subclasses
named PointDistance... and CartesianBox.... to support different queried shapes so that the
filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message