lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Karl Wright <daddy...@yahoo.com>
Subject Submission
Date Sun, 22 May 2005 01:09:58 GMT
I've been looking at the BooleanScorer code in 1.4.3 and realized that it has several problems.
 These are:
 
1) It does things in chunks of 1024 document ids.  This means it executes in a time that depends
on the number of indexed documents.
2) Finding the subscorer with the lowest document id scales linearly with the number of scorers
(corresponding to clauses in the Boolean query)
3) It does not implement the skipTo() method, because its technique of doiing 1024 document
id's at a time interferes with this.  This makes it impossible to use a BooleanScorer within
a Conjunction Scorer.
 
I've attached a rewritten BooleanScorer which solves these problems.  It basically uses a
btree to keep the individual subscorers, and it removes subscorers that have reached the end
of their documents.  It thus removes the dependency on the number of documents indexed, and
it performs in O(log(number of clauses)) instead of O(number of clauses).
 
Thanks
Karl
 


		
---------------------------------
Yahoo! Mail
 Stay connected, organized, and protected. Take the tour
Mime
View raw message