lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Apache Wiki <>
Subject [Lucene-java Wiki] Update of "SummerOfCode2011ProjectRankingNotes" by DavidNemeskey
Date Wed, 06 Jul 2011 16:32:29 GMT
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Lucene-java Wiki" for change notification.

The "SummerOfCode2011ProjectRankingNotes" page has been changed by DavidNemeskey:

    * `score + boost`: I do not consider this a boost, but rather a sum of similarity scores,
of which one happens to come from outside (e.g. PageRank)
    * `score * boost`
    * `score = tf(boost * freq) * idf`
+  * We prefer manual instantiation (for Similarities, parts thereof). Providers should be
written manually.
+ === Problems ===
   * Language modeling would require custom aggregation of query terms
    * product instead of weighted sum (this could be solved by using log, but the query norm
still messes it up)
    * decide which documents have a term, and which do not, because we have to weight them
accordingly (p_t or 1 - p_t)
    * two types of aggregation?
     * per field (definitely Similarity-specific)
     * whole query (should be Similarity-specific too, but might be OK if fixed)
+  * What about phrases? LATER... sum(DF)
  === Questions about Lucene ===
   * Is it possible to design a scoring interface that is consistent across ranking frameworks?
-  * Class loader vs. manual instantiation? manual
   * How do contexts work?
-  * Do we use Apache Commons (e.g. Validate)? NO
-  * How many Similarities of a type? With Stats we definitely need to answer this. =>
manual provider
   * NormConverter? NO
   * Common Normalization, IDF, etc. TOO
-  * Name of Normalization, BasicModel, AfterEffect, Lambda, etc. classes? like now
-  * float vs double? FLOAT
-  * What about phrases? LATER... sum(DF)
-  * Where to put log2? EasySim for now
   * QueryWeight class
   * What to pass to score()?

View raw message