lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jong Kim" <j...@sitescape.com>
Subject RE: Stop-words comparison in MoreLikeThis class in Lucene's contrib/queries project
Date Mon, 09 Jul 2007 19:45:27 GMT
Mark,

I understand your point. 
However, we do not maintain a separate field for the lower-case version of
the words. 
Instead we index them twice at the same position within the same field,
which allows us to provide case-exact match for search queries containing
upper case characters, but case-insensitive match for search queries given
in all low cases.
So I'm afraid I can't use the technique you recommend.

/Jong

-----Original Message-----
From: markharw00d [mailto:markharw00d@yahoo.co.uk] 
Sent: Monday, July 09, 2007 3:13 PM
To: java-user@lucene.apache.org
Subject: Re: Stop-words comparison in MoreLikeThis class in Lucene's
contrib/queries project

 >>the case matters only for those words that should be included.

Jong, just want to check we're on the same page - you do know MoreLikeThis
has a kind of automatic Stop-Wording built in , yes?
MoreLikeThis looks at the document frequency of all terms in the "this" 
text you provide and only selects a shortlist (up to maxQueryTerms) of the
rarer words. As such, users (admin or otherwise) surrender precise control
over what terms are used, hence my earlier point "does case really matter in
this 'inexact' scenario?" and can you use the lower-case version of the
field you said you already create?

Cheers
Mark




---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message