lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chris Sibert" <chrissib...@attbi.com>
Subject Scoring
Date Mon, 02 Sep 2002 07:11:08 GMT
I am disatisfied with the document scores that I'm getting. If a document is short, and has
one occurrence of the search term, it is ranked higher than a longer document with two occurrences
of the term. This makes little sense to me, and I'd like the longer document with more occurrences
to be ranked higher. I figured I have to override the scoring method, but I can't find where
Lucene actually does the scoring. This is actually not an uncommon problem for me, as I find
perusing the API to be high on the confusing scale, due to the lack of comprehensive Javadoc
documentation. (Something that even Sun doesn't spend much time on.) I attempt to read the
code, but variable names are terse, and there's a dearth of commenting, which makes it fairly
unfathomable. 

This is the code that I'm using. Am I doing the right thing in using the Query object, or
should I be using a different one, such as TermQuery ? Does TermQuery score differently, so
that I might be happier with it's behavior ? If not, where might I find the method that actually
computes the Document's score, so that I may modify it ? 


    Hits     find ( String  string_searchString, String string_indexPath )
    {
        Searcher            indexSearcher    ;
        Analyzer            analyzer    ;
        Query                query       ;
        QueryParser      queryParser ;
        Hits                   searchResults_Hits        ;

        try
        {
            indexSearcher         = new   IndexSearcher ( string_indexPath ) ;
            analyzer                 = new   SimpleAnalyzer ()       ;

            query                      = QueryParser.parse ( string_searchString, "DocumentText",
analyzer )    ;
            searchResults_Hits  = indexSearcher.search ( query )     ;
            
            return  searchResults_Hits ;
        }


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message