lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Damian Birchler <>
Subject Overriding DefaultSimilarity to not consider tf/idf and friends
Date Mon, 05 Nov 2012 10:26:52 GMT
Hi everyone

We are using Lucene to search for possible duplicates in an address database. We create an
index with a document for each person in the database. Each document has a field with one
term for the first name, a field with one term for the last name and so on. I think in this
setting it doesn't make sense to let term frequency, inverse document frequency and friends
influence the document score (or does it?). For this reason I'm thinking of overriding DefaultSimilarity
to not take tf/idf into account when scoring.

Do you think that's a reasonable thing to do? If so, how should I proceed (I'm looking for
implementation details here; should I, e.g., override the method that calculates the term
frequency to just return a constant [altought, at the top of my head, I wouldn't know what
would be a sensible constant.]).

Thanks a lot,

View raw message