lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <wikidi...@apache.org>
Subject [Lucene-java Wiki] Update of "SummerOfCode2011ProjectRankingTerrier" by DavidNemeskey
Date Mon, 20 Jun 2011 12:51:23 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The "SummerOfCode2011ProjectRankingTerrier" page has been changed by DavidNemeskey:
http://wiki.apache.org/lucene-java/SummerOfCode2011ProjectRankingTerrier?action=diff&rev1=3&rev2=4

  Also, as far as ''df'' goes, there is also `IndexReader.docFreq()`.
  
  Collection-level statistics seem to be harder to come by.
-  * ''number of fields'': `IndexReader.fields()`;
+  * ''number of fields'': `IndexReader.fields()`, '''BUT''' this statistic is only for normalization,
which is performed outside of the `Similarity` in Lucene; hence, we don't need it;
   * ''no. of tokens in a field'': `IndexReader.getSumOfNorms()`; it's a bit different than
the real length; it may be worth to have both, since the more options, the more possibilities
to experiment with;
   * ''avg. field length'': has to be computed as in `MockBM25Similarity.avgDocumentLength()`
from the no. of tokens in each field;
   * ''no. of documents'': `IndexReader.numDocs()` (for some reason, `maxDoc()` is used in
`MockBM25Similarity`) from the context;

Mime
View raw message