lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From markharw00d <markharw...@yahoo.co.uk>
Subject Re: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project
Date Mon, 09 Jul 2007 19:59:15 GMT
 >>So I'm afraid I can't use the technique you recommend.

ah right - so the TermVector you use from the index will return mixed 
and lower case versions of the same text.
One point to note - this would mean that of the 25 or so top terms 
selected by MoreLikeThis for querying there is a reasonable chance that 
you would be needlessly selecting both mixed and lower case versions of 
the same term. Given such duplication any match on a mixed case term 
will obviously also have a match on the lower-cased version so you are 
likely to have redundant query terms both needlessly slowing down your 
searches and limiting the variety of terms that make the cut into the 
top 25.

Cheers
Mark


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message