lucene-java-commits mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mikemcc...@apache.org
Subject svn commit: r727475 - in /lucene/java/trunk/src/java/org/apache/lucene/search: BooleanScorer.java BooleanScorer2.java
Date Wed, 17 Dec 2008 19:16:11 GMT
Author: mikemccand
Date: Wed Dec 17 11:16:10 2008
New Revision: 727475

URL: http://svn.apache.org/viewvc?rev=727475&view=rev
Log:
add comments from Doug describing how BooleanScorers work

Modified:
    lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer.java
    lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java

Modified: lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer.java
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer.java?rev=727475&r1=727474&r2=727475&view=diff
==============================================================================
--- lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer.java (original)
+++ lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer.java Wed Dec 17 11:16:10
2008
@@ -19,6 +19,39 @@
 
 import java.io.IOException;
 
+/* Description from Doug Cutting (excerpted from
+ * LUCENE-1483):
+ *
+ * BooleanScorer uses a ~16k array to score windows of
+ * docs. So it scores docs 0-16k first, then docs 16-32k,
+ * etc. For each window it iterates through all query terms
+ * and accumulates a score in table[doc%16k]. It also stores
+ * in the table a bitmask representing which terms
+ * contributed to the score. Non-zero scores are chained in
+ * a linked list. At the end of scoring each window it then
+ * iterates through the linked list and, if the bitmask
+ * matches the boolean constraints, collects a hit. For
+ * boolean queries with lots of frequent terms this can be
+ * much faster, since it does not need to update a priority
+ * queue for each posting, instead performing constant-time
+ * operations per posting. The only downside is that it
+ * results in hits being delivered out-of-order within the
+ * window, which means it cannot be nested within other
+ * scorers. But it works well as a top-level scorer.
+ *
+ * The new BooleanScorer2 implementation instead works by
+ * merging priority queues of postings, albeit with some
+ * clever tricks. For example, a pure conjunction (all terms
+ * required) does not require a priority queue. Instead it
+ * sorts the posting streams at the start, then repeatedly
+ * skips the first to to the last. If the first ever equals
+ * the last, then there's a hit. When some terms are
+ * required and some terms are optional, the conjunction can
+ * be evaluated first, then the optional terms can all skip
+ * to the match and be added to the score. Thus the
+ * conjunction can reduce the number of priority queue
+ * updates for the optional terms. */
+
 final class BooleanScorer extends Scorer {
   private SubScorer scorers = null;
   private BucketTable bucketTable = new BucketTable();

Modified: lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java
URL: http://svn.apache.org/viewvc/lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java?rev=727475&r1=727474&r2=727475&view=diff
==============================================================================
--- lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java (original)
+++ lucene/java/trunk/src/java/org/apache/lucene/search/BooleanScorer2.java Wed Dec 17 11:16:10
2008
@@ -22,6 +22,9 @@
 import java.util.List;
 import java.util.Iterator;
 
+/* See the description in BooleanScorer.java, comparing
+ * BooleanScorer & BooleanScorer2 */
+
 /** An alternative to BooleanScorer that also allows a minimum number
  * of optional scorers that should match.
  * <br>Implements skipTo(), and has no limitations on the numbers of added scorers.



Mime
View raw message