lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: svn commit: r332747 - in /lucene/java/trunk: ./ src/java/org/apache/lucene/search/regex/ src/test/org/apache/lucene/search/regex/
Date Tue, 15 Nov 2005 20:41:17 GMT
Yonik Seeley wrote:
> Scoring recap... I think I've seen 4 different types of scoring
> mentioned in this thread for a term expanding query on a single field:
> 
> 1) query_boost
> 2) query_boost * (field_boost * lengthNorm)
> 3) query_boost * (field_boost * lengthNorm) * tf(t in q)
> 4) query_boost * (field_boost * lengthNorm) * tf(t in q) * idf(t in q)
> 
> 1 & 2 can be done with ConstantScoreQuery
> 4 is currently done via rewrite to BooleanQuery and limiting the
> number of terms.
> 3 is unimplemented AFAIK.

3 is easy to implement as a subcase of 4, no?

The challenge is to implement 3 or 4 efficiently for very large queries 
w/o using gobs of RAM.  One option is to keep a score per document, 
making the RAM use proportional to the size of the collection (or at 
least the number of non-zero matches, if a sparse representation is 
used) or, as in 4, proportional to the number of terms in the query 
(with a large constant--an i/o buffer).

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message