I ll try to find a little bit of time tonight to make a sample data set go
through the two calculations to see the differences.
I ll make a summary table.
I ll comment the issue with some comments on 'my' version of the alogrithm
right now.
Nicolas
2010/4/14 Chris Male <gento0nz@gmail.com>
> Hi,
>
> My understanding of the benefits of the new algorithm is that it means a
> lower tier level resulting in fewer boxes, but more documents inside those
> boxes that are outside of the search radius.
>
> While having fewer boxes means fewer term queries to make against the
> index, more documents means more costly calculations to filter out those
> extraneous documents.
>
> For those doing just Cartesian Tier filtering it seems like the new
> approach is a win, but for those doing distance calculations on those
> documents passing the filter, it seems to come at a cost.
>
> Cheers
> Chris
>
>
> On Wed, Apr 14, 2010 at 5:00 PM, Grant Ingersoll <gsingers@apache.org>wrote:
>
>> LUCENE2359 changed the best fit calculation. I admit, I'm not entirely
>> certain which one is right, so I thought we should step back and talk about
>> what we are trying to achieve.
>>
>> Please correct me if/where I am wrong.
>>
>> Looking at the problem of tiers/tiles/grids in general, we are taking a
>> sphere, projecting it into a 2D plane. Next, we are dividing up the plane
>> into nested grids/tiers. Each tier contains 2^tier id boxes. Thus, tier
>> level 2 divides the earth up into 4 boxes. 2^15 = 32,768 boxes. We then,
>> for each box, give it a unique label which then becomes the token that we
>> index. During indexing, we typically will index many tiers, i.e. tiers 4
>> through 15.
>>
>> During search, we take in a lat/lon and a radius. The goal is to do a
>> search using the fewest terms possible. Thus, we need to pick the tier that
>> contains/covers the radius with the fewest number of boxes so that we can
>> enumerate a very small number of documents. Thus, we need to calculate the
>> best fit, which is a method inside of the CartesianTierPlotter.
>>
>> In the old way, we did:
>>
>> bf = min( 15, ceil(log2( earth_circumference / ( ( miles/2)  sqrt(
>> (miles/2)^2 / 2 ) ) ) + 1 ) // we won't go higher than 15 for accuracy
>> reasons
>>
>> The new way is:
>>
>> bf' = ceil ( log2( earth_circumference / ( 2 * miles ) ) )
>>
>> These are obviously two different calculations, never mind the min(15)
>> issue, we can easily resolve that one.
>>
>> AFAICT, the new way is much less accurate, but will likely be faster.
>>
>> So, which is right?
>>
>> Unfortunately, I find almost zero documentation on this, probably b/c the
>> nomenclature is off, but...
>>
>> Grant
>> 
>> To unsubscribe, email: javadevunsubscribe@lucene.apache.org
>> For additional commands, email: javadevhelp@lucene.apache.org
>>
>>
>
>
> 
> Chris Male  Software Developer  JTeam BV. www.jteam.nl
>
