lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
Date Tue, 12 May 2009 18:50:45 GMT

    [ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708560#action_12708560
] 

Uwe Schindler commented on SOLR-773:
------------------------------------

bq. Also, how does the TrieRange stuff factor into this?

LocalLucene does something similar like TrieRange, but in two dimensions. It stores the Latitude
and Longitude in one field as the number of a small rectangle (Cartesian tier) and the lower
precision are simply bigger rectangles (I think they are quadrats). The effect is, that you
only need one field name for the search, but you have the problem of limited precision.

TrieRange on the other side is more universal for any numeric searches and is not limited
to Geo. The bounding box search in Solr as proposed in the issue can also be simply done with
two long (e.g. by scaling the lat/lon by a factor <1) or float field TrieRangeQueries.
Interesting would be a comparison in speed and index size between LocalLucene and TrieRange.
Both can be simply done with Solr, but I had no time for it.

For our case (PANGAEA) we have another problem that is only solveable by TrieRange, not LocalLucene:
Our Datasets itself are bounding boxes and if the user enters a bounding box, a hit is, if
they intersect. This can be easily done with two half-open ranges. There is a small speed
impact because of the half-open ranges that may hit very much TermDocs for the lower precs,
but maybe I will create a special combined filter, that collects TermDocs only into one BitSet,
so you can combine this ranges easily (but no idea, how to make an senseful API for that).

Another idea to use TrieRange for geo search is using a hilbert curve on the earth and just
do a range around the position on this curve (look on the picture on http://en.wikipedia.org/wiki/Hilbert_curve
then it is clear what the idea is). As far as I know, geohash is working with this hilbert
curve (it's the position on this curve), so if you index the binary geohash as a long with
TrieRange, you could do this range very simply (correct me if I am wrong!). The drawback is,
that you will only find quadratic areas (so the use case is: find all phone cells around (lat,lon)).

In my opinion, I would recommend the following:
If you need standard queries like find all phone cells around a position, use LocalLucene.
If you need full flexibility, just see lat/lon or whatever CRS (Gauss-Kr├╝ger etc.) as two
numeric values, where you can do SQL-like "between", ">", "<", ">=" and "<=" searches
very fast.


> Incorporate Local Lucene/Solr
> -----------------------------
>
>                 Key: SOLR-773
>                 URL: https://issues.apache.org/jira/browse/SOLR-773
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch,
SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch,
SOLR-773.patch, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr components, but
we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message