lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grant Ingersoll <gsing...@apache.org>
Subject Re: [spatial] Rethinking Cartesian Tiers implementation
Date Wed, 30 Jun 2010 13:36:58 GMT
Taking a step back, I do think it is worthwhile to document what I believe is the intent of
the Tiers/Tiles/Grid and how they work.

Here, in a nutshell, is my view of what Tiers does (note, this is separate from what the code
does, as it tries to make some optimizations/lazy decisions):

1. Take a sphere/ellipsoid and project it into a 2D space
2. For a given zoom level (call it n)
	a.  Slice the 2D space up into 2^n boxes
	b. Label each of those boxes with a unique id (this is done in a lazy way, as we don't literally
label all the boxes ahead of time)

During Indexing:
1. For each desired zoom level
	a. Given a latitude and longitude and a zoom level, add a term to a field for that zoom level
that contains the id of the box that the lat/lon is in.  This is, essentially, a feature reduction
step, especially in highly dense areas as it is now possible to represent a whole bunch of
things with a single term

During search:
1. Given an input lat/lon and a distance
	a. Determine the "best fit" zoom level that will allow us to exclude the as many documents
as possible using the fewest terms possible
	b. For that zoom level, get the ids of the boxes that cover the distance given
	c. Perform the search/filtering/sorting using the box ids

Now, obviously, the devil is in the details of all this.  One needs to know how to consistently
create the box ids for starters and also determine the best fit (we have two different approaches
for that, one committed and one in JIRA).  One also has to deal with pesky things like the
prime/180 meridian and the north and south pole in order to get this all right.

This approach has, I believe, some significant advantages, especially if one can do a good
job determining the best fit, as it can significantly reduce the number of terms that need
to be iterated and it also collapses lat/lon into a single field which also helps improve
search times.  The downside is it often takes up more index space due to the many zoom levels.

Does this match up with other people's understanding?

-Grant

On Jun 29, 2010, at 11:02 PM, Grant Ingersoll wrote:

> After having spent a lot of time looking at the Tier stuff and debugging it and discussing
it on IRC, I am of the opinion (and, I still have some belief that it's just me who isn't
getting it, but others have confirmed my suspicion) that the spatial Cartesian Tier implementation
is broken and, due to the lack of documentation and resources, it is not clear how to fix
it; furthermore we don't seem to have anyone with the bandwidth or interest to fix it (I've
tried, but I've about reached my limit on it in terms of willingness to work on it).  Even
if it is not broken and it is just me, the fact that I don't get it after spending as much
time on it that I have is not a good sign for maintainability.  
> 
> AFAICT, a lot of it stems from the fact that SinusoidalProjection is broken (https://issues.apache.org/jira/browse/LUCENE-2519)
and then layers upon layers of other, minor issues were built on top of that such that at
the end of those layers are unit tests that are nearly impossible to untangle and in between
is a fairly large mess.   Well, I suppose I should moderate those statements: I think all
of it probably works as long as you stay completely in a given quadrant of the Earth for all
your points (and likely really, just the US/ northern/western hemispheres), but even then
it only works b/c it consistently calculates the wrong values for all points.
> 
> At the same time, I think the theory behind it (i.e. a labeled, nested grid system) is
useful for many people solving spatial search problems and can offer significant query time
savings.  So, I think there are a couple of things we could do:
> 
> 1. Remove it and start from scratch and take a more standard approach
> 2. Find someone to fix the existing stuff who groks it
> 3. Document that it is only really useful for a given quadrant and then either do 1 or
2 over the long term.  We still need to fix somethings, IMO, to feel comfortable w/ it.
> 4. At a minimum, I think we should mark it as deprecated until it can be better dealt
with later
> 
> For now, I'm going to remove it from Solr and just focus on finishing up the filtering
stuff for the other point types.
> 
> Thoughts?  Volunteers?
> 
> -Grant
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message