lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ryan McKinley <ryan...@gmail.com>
Subject Lucene Spatial Future
Date Sun, 03 Apr 2011 19:50:09 GMT
Hello-

I think it is worth discussing what we want to do with the lucene
spatial contrib.

If you have followed the spatial development, it started with a large
contribution and has never had much love or attention.  Grant did some
great work to get point search working in solr, but much of the lucene
spatial contrib does not work as expected, and it is not clear anyone
intends to improve/fix it. (geometry and tier).  I feel partly
responsible for pushing the creation of this contrib, but never really
using it -- the problems it solves do not apply to the work I have so
I have not been able to give it much attention.

Recently I have been working on what I hope a high level lucene
spatial API could look like.  This work has taken place at:
https://lucene-spatial-playground.googlecode.com/svn/trunk/

The key idea is that there are many 'strategies' for spatial indexing
-- you may work with simple xy coordinates, or you may need arbitrary
geometry in a crazy projection.  I hope a common API can be used for
variety of approaches.  The core api works directly with lucene and
has nice bindings to solr.  The other key concern is a good testing
framework that works across all strategies.

When I started this 'playground', I hoped to push for a spatial module
within the Apache repo.  I'm no longer sure that is the best path
forward, and want to get other peoples opinion.

My two concerns about a spatial module within lucene are community and
3rd party spatial tools

1. Community -- for the most part, developers involved in lucene are
concerned with text search;  spatial search is a nice-to-have feature,
but not something that gets serious attention.  I believe (perhaps
naively) that with some clever indexing, lucene/solr could be a
serious alternative to PostGIS.  Our development environment should
attract contribution from spatial folks.

2. 3rd party tools -- In general people working on complex geographic
problems use JTS and other LGPL tools.  There is some great work
happening at Apache SIS now, but it is a long way from being a viable
ASL alternative.  Within ASF, it is legally ok to have build
dependencies on LGPL.  The Lucene contrib bdb even includes a
dependency on a GPL(ish)!  However, these dependencies are not
recommended and only happen if the community (and PMC) think the
tradeoff is worthwhile.  Given the primary concerns of the lucene
community, I totally understand why a build dependency on LGPL may not
be acceptable.

As designed the proposed spatial API does not require JTS.  All
geometry has a 'simple' implementation what works for points and
boxes.  I even refactored the JTS dependencies to different packages
that could be hosted elsewhere.  This works, but it is far from ideal
because it makes testing the JTS implementations much more difficult
-- given that the community of developers working on the code is
likely to use the complex implementations, this separation is
unacceptable (for me anyway).  If I am going to spend the time to make
good tests, i want to make sure the classes I use are a 1st class
citizens.  Likewise, if the real work goes into testing the external
packages, it stinks if it does not apply to the simple implementations
/ base framework.  I don't think the spatial developer community
should be split across two code bases and build systems.

In the days of sub-projects, I would have proposed that option, but
now I see two options:

A.  Work on spatial lucene outside of apache -- perhaps osgeo or even
just github. (would need a different name)
B.  Allow JTS compile-time dependency in lucene, and move spatial
contrib to a real module

I think option A is better long term, but I feel like the kid saying
"if I can't have my way I'll take my ball (code) and go home"  -- i
don't want this to sound like an ultimatum, but an honest discussion
about what has the best chance of fostering a thriving development
community.

If we do elect for option A, I would also suggest we delete the
spatial contrib (in 4.0) and have solr depend on the external .jar --
this way lucene users would have what they need directly with the
external .jar, and solr users would get lots of fancy new stuff
off-the-shelf.

Thoughts?  Ideas? Concerns?

thanks
ryan

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message