lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erick Erickson <erickerick...@gmail.com>
Subject Re: Overriding DefaultSimilarity to not consider tf/idf and friends
Date Mon, 05 Nov 2012 12:10:30 GMT
first id see if omitting term frequencies and positions and norms did what
you need, these are all things you can disable OOB...

Best
Erick


On Mon, Nov 5, 2012 at 5:26 AM, Damian Birchler
<Damian.Birchler@bsiag.com>wrote:

>  Hi everyone****
>
> ** **
>
> We are using Lucene to search for possible duplicates in an address
> database. We create an index with a document for each person in the
> database. Each document has a field with one term for the first name, a
> field with one term for the last name and so on. I think in this setting it
> doesn’t make sense to let term frequency, inverse document frequency and
> friends influence the document score (or does it?). For this reason I’m
> thinking of overriding DefaultSimilarity to not take tf/idf into account
> when scoring.****
>
> ** **
>
> Do you think that’s a reasonable thing to do? If so, how should I proceed
> (I’m looking for implementation details here; should I, e.g., override the
> method that calculates the term frequency to just return a constant
> [altought, at the top of my head, I wouldn’t know what would be a sensible
> constant.]).****
>
> ** **
>
> Thanks a lot,****
>
> Damian****
>
> ** **
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message