lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael McCandless <luc...@mikemccandless.com>
Subject Re: Bug in TopFieldCollector?
Date Mon, 30 Mar 2009 08:37:08 GMT
I agree, this is not a pleasant migration path forward from 2.4.

I think maybe a good fix is to not even require IndexReader[]
subReaders to be passed in, in the first place.  Tracing downwards,
the only reason why we needs this array at construction time is for
the SortField.CUSTOM case, when it calls
FieldComparatorSource.newComparator().  This is the new API for a
custom sort.  The old API took the toplevel IndexReader, so when we
carried over to the new API we replaced that with the array of
subreaders.

But, because FieldComparator is a more advanced API that steps through
each sub-reader as it goes, I don't think we need to provide the array
of subreaders up front.

So I think we should simply remove the IndexReader[] as an arg,
entirely?  It's a new (in 2.9) API so we are free to change it.

Mike

On Mon, Mar 30, 2009 at 4:20 AM, Shai Erera <serera@gmail.com> wrote:
> Already did !
>
> Another question - I think we somehow broke TopFieldCollector ...
> Previously, in TopFieldDocCollector, it accepted an IndexReader as a
> parameter, and now it requires IndexReader[], which is called subReaders.
> Calling the 'fast' search methods with Sort has no problem obtaining that
> IndexReader[] (and sort it), but how would someone use TopFieldCollector w/o
> calling the appropriate Searcher methods?
>
> For example, since all the Searcher methods pass in fillFields = true, I
> wanted to use the method Searcher.search(Query, TopFieldCollector) in the
> test case I wrote, which BTW looks like this:
>
>   public void testSortWithoutFillFields() throws Exception {
>
>     // There was previously a bug in TopFieldCollector when fillFields was
> set
>     // to false - the same doc and score was set in ScoreDoc[] array. This
> test
>     // asserts that if fillFields is false, the documents are set properly.
> It
>     // does not use Searcher's default search methods (with Sort) since all
> set
>     // fillFields to true.
>     Sort sort = new Sort();
>     int nDocs=10;
>
>     TopDocsCollector tdc = new TopFieldCollector(sort, nDocs,
>         new IndexReader[] { ((IndexSearcher) full).getIndexReader() },
> false);
>
>     full.search(new MatchAllDocsQuery(), tdc);
>
>     ScoreDoc[] sd = tdc.topDocs().scoreDocs;
>     for (int i = 1; i < sd.length; i++) {
>       assertTrue(sd[i].doc != sd[i - 1].doc);
>     }
>   }
>
> You'll notice that creating a TopFieldCollector now is much more complicated
> and *ugly*. As a user of IndexSearcher, I can only call getIndexReader()
> which returns a single IndexReader. I don't have access to gatherSubReaders
> and sortSubReaders. I don't see why I should have access to them. So it
> forces me to create a dummy array with a single IndexReader.
>
> There are two ways I see to solve it:
> 1. Introduce a getIndexReaders() method on IndexSearcher, which will return
> an array of (sorted?) IndexReader.
> 2. Introduce a new constructor in TopFieldCollector which accepts a single
> IndexReader and make the other one package-private (for use by IndexSearcher
> only). That constructor can internally create a dummy array of readers, but
> at least it's private to the constructor and not exposed to the rest of the
> world.
>
> Otherwise, I think it ruins TopFieldCollector and will make it a lot less
> intuitive to use. At least, people who'd want to move from
> TopFieldDocCollector to TopFieldCollector, will find it very inconvenient
> and strange.
>
> What do you think? I can do that (2) as part of 1575. If (1) is better, then
> I think a different issue should be opened, because we might want to return
> such an array as sorted or something, which makes it less trivial.
>
> Shai
>
> On Mon, Mar 30, 2009 at 11:07 AM, Michael McCandless
> <lucene@mikemccandless.com> wrote:
>>
>> Looks like quite a bug, Shai!  Thanks.  It came in with LUCENE-1483.
>> I would say add test case & fix it under 1575.
>>
>> Mike
>>
>> On Mon, Mar 30, 2009 at 3:50 AM, Shai Erera <serera@gmail.com> wrote:
>> > Hi
>> >
>> > As I prepared the patch for 1575, I noticed a strange implementation in
>> > TopFieldCollector's topDocs():
>> >
>> >     ScoreDoc[] scoreDocs = new ScoreDoc[queue.size()];
>> >     if (fillFields) {
>> >       for (int i = queue.size() - 1; i >= 0; i--) {
>> >         scoreDocs[i] = queue.fillFields((FieldValueHitQueue.Entry)
>> > queue.pop());
>> >       }
>> >     } else {
>> >       Entry entry = (FieldValueHitQueue.Entry) queue.pop();
>> >       for (int i = queue.size() - 1; i >= 0; i--) {
>> >         scoreDocs[i] = new FieldDoc(entry.docID,
>> >                                     entry.score);
>> >       }
>> >     }
>> >
>> >     return new TopFieldDocs(totalHits, scoreDocs, queue.getFields(),
>> > maxScore);
>> >
>> >
>> > Notice that if fillFields is true, then documents are popped from the
>> > queue.
>> > However if it's false, then the first document is popped out of the
>> > queue
>> > and used to populate the entire ScoreDoc[]? I believe that's wrong,
>> > right?
>> > Otherwise, the returned TopFieldDocs.scoreDocs array will include the
>> > same
>> > document and score?
>> >
>> > I noticed there's no test case for that ... TopFieldCollector's
>> > constructor
>> > is called only from IndexSearcher.search(Weight, Filter, int, Sort,
>> > boolean
>> > /* fillFields */) which is called from IndexSearcher.search(Weight,
>> > Filter,
>> > int, sort) with fillFields always set to true. So perhaps that's why
>> > this
>> > hasn't showed up?
>> >
>> > BTW, the now deprecated TopFieldDocCollector's topDocs() implementation
>> > looks like this:
>> >
>> >     FieldSortedHitQueue fshq = (FieldSortedHitQueue)hq;
>> >     ScoreDoc[] scoreDocs = new ScoreDoc[fshq.size()];
>> >     for (int i = fshq.size()-1; i >= 0; i--)      // put docs in
array
>> >       scoreDocs[i] = fshq.fillFields ((FieldDoc) fshq.pop());
>> >
>> >     return new TopFieldDocs(totalHits, scoreDocs,
>> >                             fshq.getFields(), fshq.getMaxScore());
>> >
>> > It assumes fillFields is always true and always pops elements out of the
>> > queue.
>> >
>> > If this is a bug, I can fix it as part of 1575, as I'm touching that
>> > class
>> > anyway. I can also add a test case ... The fix is very simple BTW, just
>> > move
>> > the line "Entry entry = (FieldValueHitQueue.Entry) queue.pop();" inside
>> > the
>> > for loop.
>> >
>> > Shai
>> >
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-dev-help@lucene.apache.org
>>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message