lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 34407] - BooleanQuery assumes everything else implements skipTo
Date Tue, 12 Apr 2005 18:46:09 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=34407>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=34407





------- Additional Comments From paul.elschot@xs4all.nl  2005-04-12 20:46 -------
(In reply to comment #2) 
... 
>  
> > During development of a new scorer one can temporarily use the code  
> > shown in the javadocs of Scorer that implements skipTo with  
> > next() and doc().  
>  
> I'm not sure I can do that. 
> What I'm trying to do is develop a replacement for RangeQuery that is fast 
and 
> *always* works (no expanding to BooleanQuery).  I don't even care about 
scoring 
> since it almost never makes sense for a RangeQuery.  Using the same 
techniques 
> as RangeFilter, I think it should be pretty easy to do, except for 
implementing 
> skipTo. 
 
A scorer that is only used in isolation or on the top level 
will only have next() called. 
For these it is safe to throw an UnsupportedOperationException 
from skipTo(). 
 
For an OR like query one could use the same technique as RangeFilter 
by using the scorers of the clauses separately. You might call this 
a DisjunctionFilter. 
 
Each of the separate scorers could implement skipTo(), but it wouldn't 
normally be used. 
Using skipTo would be only useful for very dense results to skip to 
the doc corresponding to the next unset bit in the BitSet. 
However, such dense results are not normal for text searching. 
 
Once a BitSet filter is available bug 32965 can also be useful. 
This uses skipTo on the scorer of the filtered query 
to skip over documents not present in the filter. 
 
>  
> It seems like skipTo for UnscoredRangeQuery would require either enumerating 
> *all* docs beforehand (store in a BitSet or whatever), or keeping a termdoc 
> enumerator open for every term in the range.  Neither option seems 
attractive. 
 
skipTo() requires that the documents are accessed in order. That means some 
form of sorting is needed: for example a PriorityQueue with all termdoc 
enumerators open, or a BitSet for distribution sort. 
BooleanScorer with skipTo allowed uses a form of distribution sort 
(the buckets) combined with local sorting. 
Other methods are also  possible, eg. merge sort, but these are currently 
not used in Lucene scorers. 
 
BooleanScorer without skipTo uses distribution and an incomplete form 
of sorting by working over intervals of document numbers. 
I don't think scoring disjunctions can be made faster than that. 
Since the sorting is not complete, the next() method does not guarantee 
that documents are accessed in order. BooleanScorers without skipTo 
using the same interval can be nested nicely, though. 
 
Regards, 
Paul Elschot 
 

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message