lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Hostetter <>
Subject Re: Hacking proximity search: looking for feedback
Date Tue, 28 Feb 2006 22:12:36 GMT

: Very good points, I hadn't considered the term frequency of the digits
: affecting scoring.  As an aside, can that aspect of the score be ignored for
: these fields?

The easiest way is to use a boost that is so low it's insignificant, or
you could subclass TermQuery and override getSimilarity to return a
DelegateSimilarity which wraps the real instance and returns constant
values for things like tf() and idf() ... but i'm 95% sure that using a
RangeFilter (or a ConstantScoreRangeQuery) is going to be faster then all
of those TermQueries no matter what.

: I need to spend more time with FunctionQuery, I haven't given it the
: attention it deserves.

i would start by trying out an apples to apples comparison of your current
approach with one where your index only has one indexed field each for
long/lat that uses ConstantScoreRangeQuery to do the boxing.  Compare both
the size of the resulting indexes, the memory footprint while open, and
the time spent executing comparable queries.  You should probably compare
queries that involve both large boxes and small boxes, and depending on
hte usage pattern you expect consider caching your Filters if you expect
many boxes to be reused frequently.

once you've found the "best" way to do your boxing ... then look into
using FunctionQueries to influence your scores based on distance fro mthe
center of hte box.

: Great feedback, thanks for the notes.
: -- jeff
: On 2/28/06, Chris Hostetter <> wrote:
: >
: >
: > : Geo definition:
: > : Boxing around a center point.  It's not critical to do a radius search
: > with
: > : a given circle.  A boxed approach allows for taller or wider frames of
: > : reference, which are applicable for our use.
: >
: > if you are just loking to confine your results to a box then i think
: > RangeFiltering on both the X and Y axis will be more efficient then the
: > individual term queries you are producing.
: >
: > It will have the added bonus of not artificially affecting the scores of
: > hte documents based on how often a particular digit apears in a particular
: > position of hte latitue accross your corpus.
: >
: > Once you've filtered down to a particular bounding box, you might consider
: > going back to the function query approach to score documents inside that
: > box based on their actual distance from the center point.  I don't recall
: > at the moment but i believe FunctionQuery's Scorer supports skipTo in such
: > a way that it won't bother computing the function for a document that has
: > been skiped (ie: when containing in a BooleanQuery with another clause
: > that has already prohibited it, or when executed in the context of a
: > Filter)
: >
: >
: >
: > -Hoss
: >
: >
: > ---------------------------------------------------------------------
: > To unsubscribe, e-mail:
: > For additional commands, e-mail:
: >
: >


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message