lucene-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: [spatial] Cartesian "Tiers" nomenclature
Date Wed, 30 Dec 2009 18:27:57 GMT
Hi Mike,

> Right, NRQ is able to translate any requested range into the union
> (OR) of brackets (from the trie) created during indexing.
> 
> Can spatial do the same thing, just with 2D instead of 1D?  Ie,
> reconstruct any expressible shape (created at query time) as the union
> of some number of grids/tiers, at finer & finer levels, created during
> indexing?
> 
> Spatial, today, seems to do this, except it must also do "precise"
> filtering on each matching doc, because some of the grids may contain
> hits outside of the requested shape.

The problem in 2D is, that the number of small cart tiers (rect tiles) at
the corners of the BBOX can get very large. For NRQ it's very limited on
precisionStep, but here, the number of tiles can get very large for highest
precision (you can draw a picture to see this). So the number of terms will
get large, too, and so you can simply use NRQ in two dimensions to achieve
the same for non quadratic shaped bounding boxes. I tried to use spatial
tiers for large and non quadratic bounding boxes (like all datasets around
Africa). NRQ performed much better. Spatial is good for things like "find
all McBurger around position xy with distance d", but not for bounding box
searches.

Because of this the precision is limited for tiers, you filter the rest of
the docs. If the precision would be higher, the number of cart tiers
explodes at the corner of the BBOX.

> In fact, NRQ could also borrow from spatial's current approach -- ie,
> create the union of some smallish number of coarse brackets.  Some of
> the brackets will fall entirely within the requested range, and so
> require no further filtering, while others will fall part inside /
> part outside of the requested range, and so will require precise
> filtering.  If NRQ did this, it should have much fewer postings to
> enum, at the cost of having to do precise filtering on some of them
> (and we'd have to somehow encode the orig value in the index).

We thought about that for NRQ, too. It is still needed that you store the
full precision in the index. The idea is to use flexible indexing attributes
and/or payloads for this. You select the larger brackets and then go through
TermPositions and only enumerate docs, which payload/flexindex attribute
match. Another idea was provided by Yonik who said, the position could be
used instead of payload for the missing bits. The posincr is currently not
used by NRQ (its 1 for full prec and 0 for lower prec, so all trie terms
have pos=1).

> Mike
> 
> On Tue, Dec 29, 2009 at 8:42 PM, Yonik Seeley
> <yonik@lucidimagination.com> wrote:
> > On Tue, Dec 29, 2009 at 7:13 PM, Marvin Humphrey
> <marvin@rectangular.com> wrote:
> >> ... but for this algorithm, different rasterization resolutions need
> not
> >> proceed by powers-of-two.
> >
> > Indeed - one way to further generalize would be to use something like
> > Lucene's trie-based Numeric field, but with a square instead of a
> > line.  That would allow to tweak the space/speed tradeoff.
> >
> > -Yonik
> > http://www.lucidimagination.com
> >


Mime
View raw message