lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Robichaud, Jean-Philippe" <>
Subject RE: PerFieldSimilarity
Date Thu, 05 May 2005 17:05:18 GMT
Thanks for the clarification...

While studying more in depth the doc about Similarity, I discover something
that is troubling be a little.  The idf is calculated using the following

(Log (numDocInIndex/ (numDocWithTerm_t +1)) +1

While I agree this is fine for most application, it is not quite in mine.
numDocWithTerm_t is really, numDocWith_t.text_in_field_t.field.  That's fine
with me, the problem is the other guy numDocInIndex...  I would like to use
numDocInIndex_having_t.field.  The reason is, again, that I want the
similarity score to be really meaningful.  I have 'classes' of document in
the same index :
Document1: MeaningA="something here",ContentA="searchable text 1"
Document2: MeaningB="something else",ContentB="searchable text 2"

I have an unequal number of "A" and "B" documents.  The same query text will
be sent in contentA and contentB at the same time.  Since there is more
document in class B than in class A, the "idf" should use a different
numDocInIndex value.  Is there a good way to achieve that ?

Thanks for all your help, 


-----Original Message-----
From: Doug Cutting [] 
Sent: Wednesday, May 04, 2005 5:10 PM
Subject: Re: PerFieldSimilarity

Robichaud, Jean-Philippe wrote:
> How cool, I did not knew that...  that may help me...  If I understand you
> correctly, I can create a boolean query where each "clause" use a
> similarity ?

Yes.  That would look something like:

BooleanQuery booleanQuery = new BooleanQuery();
TermQuery clause1 = new TermQuery("foo", "bar") {
     public Similarity getSimilarity(Searcher searcher) {
       return new DefaultSimilarity() {
         public float idf(Term term) { return 1.0f; }
booleanQuery.add(clause1, true, false);


To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message