lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Dan Climan" <dcli...@keepmedia.com>
Subject Retrieving Document Boosts
Date Thu, 21 Oct 2004 00:22:25 GMT
I was trying to test whether the Document Boosts I calculate and add during
indexing were being preserved correctly.
 
I understand that what's actually preserved by default is Field Boost *
Document Boost * lengthNorm
 
I'm using default similarity and initially had no field boosts or document
boosts so I would have expected my initially query of document boosts to be
1.0, but it ranged which likely means I'm not calculating it correctly.
 
Here's the java I tried to use to calculate the document boost:
 
            IndexReader ir = IndexReader.open(indexDir);
            IndexSearcher searcher = new IndexSearcher(ir) ;
            byte[] norms = ir.norms("FullText"); //FullText is the name of
the default field to be searched
            StandardAnalyzer sa = new StandardAnalyzer();
            Similarity sim = searcher.getSimilarity();
            
            TermEnum terms = ir.terms();
            int numTerms = 0;
            while (terms.next())
            {
                Term t = terms.term();
                
                if (t.field().equals("FullText"))
                    numTerms++;
            }
            double lengthNorm = 1.0 / Math.sqrt(numTerms); //since
lengthNorm was defined as 1/sqrt(numTerms) by default

            String key = "bush" // some term to be searched
            Query q2 = QueryParser.parse(key,  "FullText", sa);
            Hits hits2 = searcher.search(q2) ;
            float f = sim.decodeNorm(norms[hits2.id(0)]);  //ie get the norm
for the first hit returned in the search
            System.out.println("Boost: " + f / lengthNorm);
 
What am I missing in the calculation? I understand that there are precision
limitation, but the results I'm getting vary and are mostly in the range
71.68 to 143.35
 
Is there a faster way than iterating through the terms to calculate
lengthNorms?
 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message