lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matthew Hall <mh...@informatics.jax.org>
Subject Re: IDF scoring issue
Date Wed, 17 Dec 2008 15:14:16 GMT
Well, you could also do a simple test of removing IDF from the scoring 
equation and seeing if the query then reacts the way you want it to.

Simply write your own custom similarity that does this, and test out to 
see how it works.

Handily enough, I've already done this, so here's some code you can try:


Fix the package declaration to something that works for you, and then 
simply use the custom similarity at the appropriate times.

======================================================================
package org.jax.mgi.shr.searchtool;

import org.apache.lucene.search.DefaultSimilarity;

/**
 * This is our custom similarity class, which removes document frequency 
from
 * the calculation of score.
 *
 * It extends the DefaultSimilarity class, and thusly inherits most of its
 * methods from it.
 *
 * @author mhall
 *
 */

public class MGISimilarity extends DefaultSimilarity {

    /**
     * If we have any doc frequency at all in the index, normalize it to 
1 (The
     * document exists)
     *
     * Otherwise, return 0 (Does not exist)
     *
     * @param docFreq
     * This items doc frequency
     * @param numDocs
     * How many documents this item appears in.
     *
     * This API is enforced by the DefaultSimilarity class.
     *
     */

    public float idf(int docFreq, int numDocs) {
        if (docFreq > 0) {
            return 1.0f;
        } else {
            return 0.0f;
        }
    }

}

===================================================================
Rajiv2 wrote:
> Because, the search term is provided by a user, and that user would explicity
> have to put quotes around "marietta ga" when I beleive the search text as it
> is : fleming roofing inc., marietta ga  -- should score higher for "marietta
> ga"
>
> rajiv
>
>
> Grant Ingersoll-6 wrote:
>   
>> On Dec 16, 2008, at 8:19 PM, Rajiv2 wrote:
>>
>>     
>>> Hello,
>>>
>>> I'm using the default lucene Queryparser on the search text : fleming
>>> roofing inc., marietta ga
>>>
>>> Also, I don't want to modify the search text by putting quotes around
>>> "marietta ga" which forces the query parser to make a phrase query.
>>>       
>> Why not?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>>     
>
>   


-- 
Matthew Hall
Software Engineer
Mouse Genome Informatics
mhall@informatics.jax.org
(207) 288-6012



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message