lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Smiley (Commented) (JIRA)" <j...@apache.org>
Subject [jira] [Commented] (SOLR-2155) Geospatial search using geohash prefixes
Date Tue, 27 Sep 2011 05:57:13 GMT

    [ https://issues.apache.org/jira/browse/SOLR-2155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13115244#comment-13115244
] 

David Smiley commented on SOLR-2155:
------------------------------------

Your use-case is a feature I have intended to have LSP address in a direct manner when I have
time. In the mean time, there are a couple approaches that should work.

The first approach that comes to mind is to use the LSP QuadPrefixTree with LSP's ability
to index rectangles.  You would treat the x dimension as time, and ignore the y dimension
(use 0). What helps make this possible is LSP's unique ability to index shapes other than
points, and in an efficient manner.  The only spatial filter query operation that LSP supports
right now is an intersection. If your query is simply a point (a specific time) then this
is fine, or if it is a time duration and you want all stores that were open for at least part
of this time, then it's fine. If your query is a time duration and you want it to reside completely
_within_ an indexed time duration, then no-can-do for now.  Based on the nature of your use-case,
it may suffice to use multiple spatial filter queries, each one a point (time) at each hour
interval of the desired query duration.

The second approach is similar to your suggestion but for y = closing time, not the delta.
 y should always be > x. I just did some sample Venn diagrams to verify this approach.
If you want to find documents with an indexed duration that completely overlaps your query
time, then you do a bounding box filter query from x=0-starttime and y=endtime-max (where
max is the maximum indexable time). When you initialize the LSP QuadPrefixTree you need to
tell it the range of values.  Some time ago when writing tests, I discovered it simply can't
handle Double.MAX_VALUE, but I imagine it will handle your 30,000.  If you want to use this
patch (SOLR-2155) and not LSP then you will instead have to map your times to latitude-longitude
ranges and use a Geohash grid length with granularity sufficient to differentiate your smallest
unit of time (5min).

I think the 2nd approach is simplest and ideal based on what you've said about your needs.

If you want help with LSP then email me directly: david.w.smiley@gmail.com
                
> Geospatial search using geohash prefixes
> ----------------------------------------
>
>                 Key: SOLR-2155
>                 URL: https://issues.apache.org/jira/browse/SOLR-2155
>             Project: Solr
>          Issue Type: Improvement
>            Reporter: David Smiley
>            Assignee: Grant Ingersoll
>         Attachments: GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch, GeoHashPrefixFilter.patch,
SOLR-2155_GeoHashPrefixFilter_with_sorting_no_poly.patch, SOLR.2155.p3.patch, SOLR.2155.p3tests.patch
>
>
> There currently isn't a solution in Solr for doing geospatial filtering on documents
that have a variable number of points.  This scenario occurs when there is location extraction
(i.e. via a "gazateer") occurring on free text.  None, one, or many geospatial locations might
be extracted from any given document and users want to limit their search results to those
occurring in a user-specified area.
> I've implemented this by furthering the GeoHash based work in Lucene/Solr with a geohash
prefix based filter.  A geohash refers to a lat-lon box on the earth.  Each successive character
added further subdivides the box into a 4x8 (or 8x4 depending on the even/odd length of the
geohash) grid.  The first step in this scheme is figuring out which geohash grid squares cover
the user's search query.  I've added various extra methods to GeoHashUtils (and added tests)
to assist in this purpose.  The next step is an actual Lucene Filter, GeoHashPrefixFilter,
that uses these geohash prefixes in TermsEnum.seek() to skip to relevant grid squares in the
index.  Once a matching geohash grid is found, the points therein are compared against the
user's query to see if it matches.  I created an abstraction GeoShape extended by subclasses
named PointDistance... and CartesianBox.... to support different queried shapes so that the
filter need not care about these details.
> This work was presented at LuceneRevolution in Boston on October 8th.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message