lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Gerard Sychay" <>
Subject Re: similarity of two texts - another question
Date Wed, 02 Jun 2004 18:22:23 GMT
Hmm, the term vector does not have to consist of only term frequencies,
does it? To give weight to rare terms, could you create a term vector of
(TF*IDF) values for each term?  Then, a distance function would measure
how many terms two vectors have in common, giving weight to how many
rare terms two vectors have in common.

>>> David Spencer <> 06/01/04 08:25PM >>>
Erik Hatcher wrote:

> On Jun 1, 2004, at 4:41 PM, uddam chukmol wrote:
>> Well, a question again, how does Lucene compute the score between a 

>> document and a query?

And I might add, thus, this approach to similarity gives more weight to

rare terms that match, which one might want for this kind of similarity


To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message