lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dharmalingam <>
Subject Vector Space Model: New Similarity Implementation Issues
Date Tue, 26 Feb 2008 20:45:20 GMT

Hi List,

I am pretty new to Lucene. Certainly, it is very exciting. I need to
implement a new Similarity class based on the Term Vector Space Model given

Although that model is similar to Lucene’s model
I am having hard time to extend the Similarity class to calculate that

In that model, “tf” is multiplied with Idf for all terms in the index, but
in Lucene “tf” is calculated only for terms in the given Query. Because of
that effect, the norm calculation should also include “idf” for all terms.
Lucene calculates the norm, during indexing, by “just” counting the number
of terms per document. In the web formula (in, a document norm
is calculated after multiplying “tf” and “idf”.

FYI: I could implement “idf” according to formula, but not the
“tf” and “norm”

Could you please comment me how I can implement a new Similarity class that
will fit in the Lucene’s architecture, but still implement the vector space
model given in

Thanks a lot for your comments,


View this message in context:
Sent from the Lucene - Java Users mailing list archive at

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message