lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Helleringer, Nicolas" <nicolas.hellerin...@novacodex.net>
Subject Re: [SPATIAL] Best Fit Calculation
Date Wed, 14 Apr 2010 15:23:49 GMT
I ll try to find a little bit of time tonight to make a sample data set go
through the two calculations to see the differences.
I ll make a summary table.

I ll comment the issue with some comments on 'my' version of the alogrithm
right now.

Nicolas

2010/4/14 Chris Male <gento0nz@gmail.com>

> Hi,
>
> My understanding of the benefits of the new algorithm is that it means a
> lower tier level resulting in fewer boxes, but more documents inside those
> boxes that are outside of the search radius.
>
> While having fewer boxes means fewer term queries to make against the
> index, more documents means more costly calculations to filter out those
> extraneous documents.
>
> For those doing just Cartesian Tier filtering it seems like the new
> approach is a win, but for those doing distance calculations on those
> documents passing the filter, it seems to come at a cost.
>
> Cheers
> Chris
>
>
> On Wed, Apr 14, 2010 at 5:00 PM, Grant Ingersoll <gsingers@apache.org>wrote:
>
>> LUCENE-2359 changed the best fit calculation.  I admit, I'm not entirely
>> certain which one is right, so I thought we should step back and talk about
>> what we are trying to achieve.
>>
>> Please correct me if/where I am wrong.
>>
>> Looking at the problem of tiers/tiles/grids in general, we are taking a
>> sphere, projecting it into a 2D plane.  Next, we are dividing up the plane
>> into nested grids/tiers.  Each tier contains 2^tier id boxes.  Thus, tier
>> level 2 divides the earth up into 4 boxes.  2^15 = 32,768 boxes.  We then,
>> for each box, give it a unique label which then becomes the token that we
>> index.  During indexing, we typically will index many tiers, i.e. tiers 4
>> through 15.
>>
>> During search, we take in a lat/lon and a radius.  The goal is to do a
>> search using the fewest terms possible.  Thus, we need to pick the tier that
>> contains/covers the radius with the fewest number of boxes so that we can
>> enumerate a very small number of documents.  Thus, we need to calculate the
>> best fit, which is a method inside of the CartesianTierPlotter.
>>
>> In the old way, we did:
>>
>> bf = min( 15, ceil(log2(  earth_circumference /  ( ( miles/2) - sqrt(
>> (miles/2)^2  /  2 ) ) ) + 1 )   // we won't go higher than 15 for accuracy
>> reasons
>>
>> The new way is:
>>
>> bf' = ceil ( log2( earth_circumference / ( 2 * miles ) ) )
>>
>> These are obviously two different calculations, never mind the min(15)
>> issue, we can easily resolve that one.
>>
>> AFAICT, the new way is much less accurate, but will likely be faster.
>>
>> So, which is right?
>>
>> Unfortunately, I find almost zero documentation on this, probably b/c the
>> nomenclature is off, but...
>>
>> -Grant
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>>
>
>
> --
> Chris Male | Software Developer | JTeam BV.| www.jteam.nl
>

Mime
View raw message