lucene-solr-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "patrick o'leary (JIRA)" <j...@apache.org>
Subject [jira] Commented: (SOLR-773) Incorporate Local Lucene/Solr
Date Tue, 12 May 2009 20:28:45 GMT

    [ https://issues.apache.org/jira/browse/SOLR-773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12708605#action_12708605
] 

patrick o'leary commented on SOLR-773:
--------------------------------------

Sorry for not getting into this sooner-

Lets take a step back for a second, and ask a couple of questions, my thoughts are provided.

1) What is the goal we want to achieve?
   - Provide a first iteration of a geographical search entity to SOLR
   - Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users
out, increases developers from 1 to many.

2) What is the level of commitment, and road map of spatial solutions in lucene and solr?
   - The primary goal of SOLR is as a text search engine, not GIS search, there are other
and better ways to do that
    without reinventing the wheel and shoe horn-ing it into lucene. 
   (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like
postGis and other tools can be used)
   - We can never fully solve everyone's needs at once, lets start with what we have, and
iterate upon it.
   - I'm happy for any improvements as long as they keep to two goals A. don't make it stupid
B. don't make it complex.

3) Raw Math through trie data structures, Spatial ids geo hash, Tier Id's Cartesian tiers,
which one?
   - Why not all? Again we can't solve everyone's needs so why not let them have the tools
to help themselves.

 As for bench marking, I have performed some recently using tdouble precision 0, 
~1 Million docs covering the state of NY
Top density was ~300,000 between Manhattan & Brooklyn area.

Returning all results, avg of 100 hits:
Trie Double: 108ms
Cartesian Tier: 12ms

The reason for the difference, is with Trie Ranges, you are doing 2 sets of range filters/
queries.
Cartesian you are doing 1 iteration for maybe 4 to 16 fielded id's.
And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't
tried, 12ms is something I can live with.

However, the distance calculation is the killer, 300,000 took about 1.8 seconds in a single
thread on a 3.2GHz machine.
 
I was working on some additional features in locallucene, such as poly lines, and convex hulls,
which using the Cartesian tierIds 
can give some basic quick features such as intersect, contains, and a nifty feature of having
sorted id's is nearby results.

Also faceting on tierId's can give you hot spot results.
One final feature, the projection method is a an implementation of IProjector, which allows
you to create your own projection
currently I'm using Sinusoidal, but you can do your own, such as say 
- Google Mercator (I use a similar quad grid concept, just different projection method) 
- Open Map
etc..

There's a lot that can be done, but we should stay focused on primary goals, and iterate,
iterate iterate. 

> Incorporate Local Lucene/Solr
> -----------------------------
>
>                 Key: SOLR-773
>                 URL: https://issues.apache.org/jira/browse/SOLR-773
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Grant Ingersoll
>            Assignee: Grant Ingersoll
>            Priority: Minor
>         Attachments: lucene.tar.gz, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch,
SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773-local-lucene.patch, SOLR-773.patch,
SOLR-773.patch, spatial-solr.tar.gz
>
>
> Local Lucene has been donated to the Lucene project.  It has some Solr components, but
we should evaluate how best to incorporate it into Solr.
> See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


Mime
View raw message