lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <cutt...@apache.org>
Subject Re: Retrieving Document Boosts
Date Thu, 21 Oct 2004 02:35:11 GMT
Dan Climan wrote:
>             TermEnum terms = ir.terms();
>             int numTerms = 0;
>             while (terms.next())
>             {
>                 Term t = terms.term();
>                 
>                 if (t.field().equals("FullText"))
>                     numTerms++;
>             }
>             double lengthNorm = 1.0 / Math.sqrt(numTerms); //since
> lengthNorm was defined as 1/sqrt(numTerms) by default

The numTerms is not the number of unique words in the collection, but 
rather the number of tokens in the document in question.  So, if you 
want to re-create this externally you could re-tokenize the text for the 
field and count the tokens.

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message