lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ype Kingma <ykin...@xs4all.nl>
Subject Re: too many hits - OutOfMemoryError; Low frequency terms
Date Fri, 30 May 2003 20:52:49 GMT
Doug,

On Thursday 29 May 2003 14:30, Doug Cutting wrote:
> Ype Kingma wrote:
> > Terms that inadvertantly have a low document frequency (spelling
> > errors for example), get a term relevancy in query execution that
> > is higher than they actually deserve.
> > This problem surfaces when term expansion results in such terms.
> > Is there a way in Lucene to give all expanded terms the same relevancy?
>
> You could override Similarity.idf(Term, Searcher), so that all query
> terms get the same weight.  Or, if you only wanted to apply this to

That seems to be overdoing it a bit.

> expanded queries, you could change your term expander so that each
> term's boost is set to 1/Similarity.idf(term, searcher), in order to
> cancel the effect of IDF for just the expanded terms.

> http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/search/Similari
>ty.html#idf(org.apache.lucene.index.Term,%20org.apache.lucene.search.Searche
>r)

I'll have a look at a change in term expansion.
Thanks a lot.

Ype

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message