lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: SweetSpotSimiliarity
Date Wed, 24 May 2006 16:33:35 GMT
Marvin Humphrey wrote:
> The only answer seems to be to apply different lengthNorm algos to  
> different fields.

FYI, Nutch uses the following:

http://svn.apache.org/viewvc/lucene/nutch/trunk/src/java/org/apache/nutch/indexer/NutchSimilarity.java?view=markup

All of this is seat-of-the-pants, developed by hand-tuning a few 
queries.  Like code optimization, relevance tuning is better done with 
large amounts of real data.  If you have trusted relevant/non-relevant 
judgements for a representative sample of queries, then you can do a 
much better job of setting these parameters.  Unfortunately, such 
judgements are expensive to generate.

For Web data, one source of relevance judgements is:

http://ir.dcs.gla.ac.uk/test_collections/

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message