lucene-solr-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Solr Wiki] Update of "SolrAdaptersForLuceneSpatial4" by DavidSmiley
Date Fri, 05 Oct 2012 03:49:05 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Solr Wiki" for change notification.

The "SolrAdaptersForLuceneSpatial4" page has been changed by DavidSmiley:
http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4?action=diff&rev1=4&rev2=5

Comment:
Updated intro text; more to come...

  
  = Lucene / Solr 4 Spatial =
  
- This document describes how to use the new spatial functionality in Lucene / Solr 4.  The
bulk of the implementation lives in the new Lucene spatial module in v4 committed on March
13th.  It replaces the former "Lucene spatial contrib" in v3.  The Solr piece is small as
it only needs to provide field types which are essentially adapters to the code in the Lucene
spatial module.  Furthermore, understand that the shape implementations and other core spatial
code that isn't related to Lucene is held in another new open-source project called Spatial4j.
 Presently, polygon support requires an additional dependency -- JTS.  As of this writing,
28-June 2012, the Solr portion has yet to be introduced into Solr trunk. It should come into
Solr via SOLR-3304 "soon".
+ This document describes how to use the new spatial functionality in Lucene / Solr 4.  The
bulk of the implementation lives in the new Lucene 4 spatial module.  It replaces the former
"Lucene spatial contrib" in v3.  The Solr piece is small as it only needs to provide field
types which are essentially adapters to the code in the Lucene spatial module.  Furthermore,
understand that the shape implementations and other core spatial code that isn't related to
Lucene is held in another new open-source project called [[https://github.com/spatial4j/spatial4j|Spatial4j]].
 Presently, polygon support requires an additional dependency -- [[http://sourceforge.net/projects/jts-topo-suite/|JTS]].
  
  
  == New features, over Solr 3 spatial ==
  
- Note: "Solr 3 spatial" refers to the spatial support introduced in that version of Solr
which still exists in v4.  Solr 3 spatial does ''not'' actually use Lucene 3's spatial contrib
module aside from DistanceUtils.java.
+ Note: "Solr 3 spatial" refers to the spatial support introduced in that version of Solr
which still exists in v4.  Except for a small utility class, Solr 3 spatial does ''not'' actually
use Lucene 3's defunct spatial contrib module.
  
- These features describe what developer-users of Lucene/Solr 4 will appreciate.  Under the
hood, it's a framework designed to be extended for different so-called spatial strategies.
 I'll assume here the RecursivePrefixTreeStrategy as it should address most use-cases and
it's has the best tests.
+ These features describe what developer-users of Lucene/Solr 4 will appreciate.  Under the
hood, it's a framework designed to be extended for different so-called "spatial strategies".
 I'll assume here the RecursivePrefixTreeStrategy as it should address most use-cases and
it has the best tests.
  
-  * Multi-value indexes.  This is key for any project that geocodes natural language documents,
since a variable number of locations are extracted from text.
-  * Index shapes with area, not just points.  An indexed shape is essentially pixelated (i.e.
gridded) to a configured resolution per shape.  Note: If extremely high precision of the edges
of the shape needs to be retained for accurate searching, then this solution probably won't
scale well compared to other approaches such as those that index the bounding box but retain
the original shape vector.  Note: this capability sorely needs testing.
-  * A polygon shape.  It can be the indexed shape or query shape.  Note: This requires the
JTS dependency.  The polygon assumes a Mercator / Cartesian projection, and consequently doesn't
support pole-wrap.  As of 1 June 2012 in Spatial4j 0.3-SNAPSHOT, it does support dateline
crossing.
+  * Multi-valued indexed fields.  This is critical for storing the results of automatic place
extraction from text using natural language processing techniques with a gazetteer (a variant
of "geocoding"), since a variable number of locations will be found.
+  * Index shapes with area, not just points.  An indexed shape is essentially pixelated (i.e.
gridded) to a configured resolution per shape.  By default that resolution is defined by a
percentage of the overall shape size, and it applies to query shapes too.  Note: If extremely
high precision of shape edges needs to be retained for accurate indexing, then this solution
probably won't scale too well at indexing time (big indexes, slow indexing).  On the other
hand, query shapes generally scale well to the maximum configured precision regardless of
shape size.  Note: indexing shapes with area sorely [[https://issues.apache.org/jira/browse/LUCENE-4419|needs
testing]].
+  * Polygon, LineString and other new shapes.  All shapes are supported as indexed shapes
and query shapes.  Note: Shapes other than point, rectangle and circle are supported via JTS
-- an otherwise optional dependency.  JTS views the world as a flat plane; the latitude and
longitude are mapped to this plane directly.  It uses Euclidean math operations, not Geodesic
ones.  By and large this isn't a problem, although it can be if the vertices are particularly
far apart longitudinally.  Spatial4j adapts shapes that cross the dateline to be compatible
with JTS, and you shouldn't notice a problem (notwithstanding unknown bugs).  It does not
support shapes covering the poles yet.  Consequently if you want to index or query by the
Antarctica polygon for example, you are out of luck for now.
+  * Rectangles with user-specifiable corners.  Oddly, Solr 3 spatial only supports the bounding
box of a circle. 
-  * Multi-value distance sort / score boost.  Note: this is a preliminary unoptimized implementation
that uses a fair amount of RAM. 
+  * Multi-value distance sort / score boost.  Note: this is a preliminary unoptimized implementation
that uses a fair amount of RAM.  An alternative should be provided in the future.
-  * Configurable precision which can vary per shape at both index & query time.  This
enhances the performance.  Solr 3 indexes and queries based on the full precision of a double
for latitude and longitude, which is excessive for nearly any use-case.
-  * Fast filtering.  The code was benchmarked once showing it outperforms Solr 3's "LatLonType"
at its own game (single valued indexed points), and a 3rd party anecdotally reported it was
faster on his large index.  It hasn't been benchmarked in well over a year now though, and
this is a TODO item.  Also, Solr 3 LatLonType sometimes requires all the points to be in memory,
whereas the new spatial module here doesn't for filtering.
+  * Configurable precision which can vary per shape at query time (and sort of at index time).
 This enhances the performance.
+  * Fast filtering.  The code was benchmarked once showing it outperforms Solr 3's "LatLonType"
at its own game (single valued indexed points), and several 3rd parties anecdotally reported
the same, especially for multi-million document indices.  It is based on SOLR-2155 which was
benchmarked in January 2010; so a new benchmark is a TODO item.  Also, Solr 3 LatLonType sometimes
requires all the points to be in memory, whereas the new spatial module here doesn't for filtering.
  
  Of course, the basics in Solr 3 not mentioned here are implemented in this framework.  For
example, lat-lon bounding boxes and circles.
  

Mime
View raw message