lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@lucene.com>
Subject Re: too many hits - OutOfMemoryError; Low frequency terms
Date Thu, 29 May 2003 21:30:39 GMT
Ype Kingma wrote:
> Terms that inadvertantly have a low document frequency (spelling
> errors for example), get a term relevancy in query execution that
> is higher than they actually deserve.
> This problem surfaces when term expansion results in such terms. 
> Is there a way in Lucene to give all expanded terms the same relevancy?

You could override Similarity.idf(Term, Searcher), so that all query 
terms get the same weight.  Or, if you only wanted to apply this to 
expanded queries, you could change your term expander so that each 
term's boost is set to 1/Similarity.idf(term, searcher), in order to 
cancel the effect of IDF for just the expanded terms.

http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similarity.html#idf(org.apache.lucene.index.Term,%20org.apache.lucene.search.Searcher)

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message