lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerard Sychay" <Gerard.Syc...@cchmc.org>
Subject Re: similarity of two texts - another question
Date Wed, 02 Jun 2004 18:22:23 GMT
Hmm, the term vector does not have to consist of only term frequencies,
does it? To give weight to rare terms, could you create a term vector of
(TF*IDF) values for each term?  Then, a distance function would measure
how many terms two vectors have in common, giving weight to how many
rare terms two vectors have in common.

>>> David Spencer <dave-lucene-user@tropo.com> 06/01/04 08:25PM >>>
Erik Hatcher wrote:

> On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote:
>
>> Well, a question again, how does Lucene compute the score between a 

>> document and a query?
>

And I might add, thus, this approach to similarity gives more weight to

rare terms that match, which one might want for this kind of similarity

measure.



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message