lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rasik Pandey" <rasik.pan...@ajlsm.com>
Subject RE : RE : TR : Possible Bug with MultiSearcher?
Date Fri, 20 Sep 2002 15:14:41 GMT
Doug,
Understood. I made the second change,in MulitSeacher, and it works on
this end. Do you think this change needs to be made in other places of
the lucene code, such as the SegmentsReader.readerIndex(int n) method,
as it uses what looks to be the same algorithm?


Thanks,
Rasik

-----Message d'origine-----
De : Doug Cutting [mailto:cutting@lucene.com] 
Envoyé : vendredi 20 septembre 2002 00:05
À : Lucene Developers List
Objet : Re: RE : TR : Possible Bug with MultiSearcher?




Rasik Pandey wrote:
> Developers,
> Attached is the diff for MultiSearcher which seems to correct these
> bugs. I have not yet found any problems caused by these changes in 
> testing........but we will keep you informed!

[ ... ]

diff -w -r1.4 MultiSearcher.java
96,98c100,105
<   public final Document doc(int n) throws IOException {
<     int i = searcherIndex(n);			  // find searcher index
<     return searchers[i].doc(n - starts[i]);
---
 >   public Document doc(int n) throws IOException {
 >     int i = searcherIndex(n);
 >     if (i < 0 || i > searchables.length)
 >         return null;
 >     else
 >       return searchables[i].doc(n - starts[i]);

I don't agree with this change.  If you pass in a bogus document number 
then an exception should be thrown here.  ArrayIndexOutOfBounds is not 
the friendliest of exceptions, but it's better than returning null.

> 113c120
> <       else if (n > midValue)
> ---
> 
>>      else if (n >= midValue) //made this a ">=", to ensure that a
>> valid index is found

This is the standard binary search algorithm, and should not be changed 
lightly (according to Knuth, no less).  The problem is when there are 
multiple entries with the same value then this algorithm doesn't always 
return the index of the largest one, which is required in this case.  So

I think the fix is to scan for the largest value:

--- MultiSearcher.java.~1.5.~	2002-07-17 10:28:56.000000000 -0700
+++ MultiSearcher.java	2002-09-19 15:02:02.000000000 -0700
@@ -116,9 +116,13 @@
  	hi = mid - 1;
        else if (n > midValue)
  	lo = mid + 1;
-      else
+      else {
+        while (mid+1 < searchables.length && starts[mid+1] == midValue)
{
+          mid++;                                  // return last match
+        }
  	return mid;
      }
+    }
      return hi;
    }

That passes the test suite just checked in too!

Any objections to this?

Doug


--
To unsubscribe, e-mail:
<mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail:
<mailto:lucene-dev-help@jakarta.apache.org>


--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message