lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vadim Gindin <vgin...@detectum.com>
Subject Re: Read DocValue twice
Date Tue, 20 Feb 2018 10:30:09 GMT
Adrien, would you be so kind to look at my Query/Weight/Scorer. I'm
attaching those files containing them. I've removed unnecessary code like
initialization, toString, equals and so on. Finally, how would iterator
work correctly - it would be allowed to navigate through it twice - in
score() and in explanation(). Isn't it?

Regards,
Vadim Gindin

On Mon, Feb 19, 2018 at 10:55 PM, Adrien Grand <jpountz@gmail.com> wrote:

> Yes, this is the problem. This doc ID is a special sentinel value that
> means that the iterator is exhausted. I don't have enough context to know
> what the exact problem is but there is a bug in your custom query.
>
> Le lun. 19 févr. 2018 à 16:07, Vadim Gindin <vgindin@detectum.com> a
> écrit :
>
> > I have the scorer that is similar to DisjunctionScorer.java with
> >
> > private final DisiPriorityQueue subScorers;
> > private final DisjunctionDISIApproximation approximation;
> >
> > They are initialized in a constructor like that:
> >
> >    this.subScorers = new DisiPriorityQueue(subScorers.size());
> >    for (Scorer scorer : subScorers) {
> >        final DisiWrapper w = new DisiWrapper(scorer);
> >        this.subScorers.add(w);
> >    }
> >    this.approximation = new DisjunctionDISIApproximation(
> this.subScorers);
> >
> >
> >
> > I use them in score() and in explain(). In explain() I do
> >
> >    this.approximation.advance(doc);
> >
> > And further the same code as in score(). I've also added logging. And
> > here is the one string:
> >
> > explain: doc=2147483647 <(214)%20748-3647>, field=params, maxDoc=67649
> >
> > doc looks not so good..
> >
> >
> > On Mon, Feb 19, 2018 at 7:32 PM, Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > Can you add some debug logging to see what the values of topList.doc
> and
> > > reader.maxDoc() are before before you call advanceExact?
> > >
> > > What do you mean by "I reuse the same DisiPriorityQueue of scorers in
> > > score() and explain()". This shouldn't be possible.
> > >
> > > Le lun. 19 févr. 2018 à 15:23, Vadim Gindin <vgindin@detectum.com>
a
> > > écrit :
> > >
> > > > I use these calls in both cases. In score() and explain() I have the
> > > > following code:
> > > >
> > > > SortedNumericDocValues numDocVal = DocValues.getSortedNumeric(
> reader,
> > > > fieldName);
> > > > if (numDocVal != null && numDocVal.advanceExact(topList.doc))
{
> > > >     long val = numDocVal.nextValue();
> > > >
> > > >     ..
> > > > }
> > > >
> > > > I reuse the same DisiPriorityQueue of scorers in score() and
> explain().
> > > >
> > > > On Mon, Feb 19, 2018 at 6:54 PM, Adrien Grand <jpountz@gmail.com>
> > wrote:
> > > >
> > > > > If you want to read the values again, you need to call setDocument
> > > > (Lucene
> > > > > < 7.0) or advanceExact (Lucene >= 7.0) before calling nextValue().
> > > > >
> > > > > Le lun. 19 févr. 2018 à 14:41, Vadim Gindin <vgindin@detectum.com>
> a
> > > > > écrit :
> > > > >
> > > > > > Hi all
> > > > > >
> > > > > > I use DocValue for scoring function. I.e. I have some column
with
> > > > > integers,
> > > > > > that are used in scoring formula. So I have a scorer that
> > calculates
> > > > > > scoring function twice:
> > > > > > - in score()
> > > > > > - in explain()
> > > > > >
> > > > > > I got the following error in explain:
> > > > > >
> > > > > > Caused by: java.lang.IndexOutOfBoundsException
> > > > > >         at java.nio.Buffer.checkIndex(Buffer.java:540)
> > > ~[?:1.8.0_161]
> > > > > >         at java.nio.DirectByteBuffer.get(
> DirectByteBuffer.java:253)
> > > > > > ~[?:1.8.0_161]
> > > > > >         at
> > > > > > org.apache.lucene.store.ByteBufferGuard.getByte(
> > > > > ByteBufferGuard.java:118)
> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > > 8a38550659
> > > > -
> > > > > > ubuntu - 2017-10-13 16:12:42]
> > > > > >         at
> > > > > >
> > > > > > org.apache.lucene.store.ByteBufferIndexInput$
> > > SingleBufferImpl.readByte(
> > > > > ByteBufferIndexInput.java:385)
> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > > 8a38550659
> > > > -
> > > > > > ubuntu - 2017-10-13 16:12:42]
> > > > > >         at
> > > > > >
> > > > > > org.apache.lucene.util.packed.DirectReader$
> DirectPackedReader8.get(
> > > > > DirectReader.java:145)
> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > > 8a38550659
> > > > -
> > > > > > ubuntu - 2017-10-13 16:12:42]
> > > > > >         at
> > > > > >
> > > > > >
> > > >
> > org.apache.lucene.codecs.lucene70.Lucene70DocValuesProducer$3.longValue(
> > > > > Lucene70DocValuesProducer.java:481)
> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > > 8a38550659
> > > > -
> > > > > > ubuntu - 2017-10-13 16:12:42]
> > > > > >         at
> > > > > >
> > > > > > org.apache.lucene.index.SingletonSortedNumericDocValue
> s.nextValue(
> > > > > SingletonSortedNumericDocValues.java:73)
> > > > > > ~[lucene-core-7.1.0.jar:7.1.0 84c90ad2c0218156c840e19a64d72b
> > > 8a38550659
> > > > -
> > > > > > ubuntu - 2017-10-13 16:12:42]
> > > > > >
> > > > > > I've found the following comment in the source code of
> > > > > > SortedNumericDocValues.java:
> > > > > >
> > > > > > /**
> > > > > >  * Iterates to the next value in the current document.  Do not
> call
> > > > > > this more than {@link #docValueCount} times
> > > > > >  * for the document.
> > > > > >  */
> > > > > >
> > > > > > public abstract long nextValue() throws IOException;
> > > > > >
> > > > > >
> > > > > > Questions:
> > > > > > 1) Why I can't read the values twice?
> > > > > > 2) How can I manage this situation?
> > > > > > 3) Can it work for NumericDocValues?
> > > > > >
> > > > > > Regards,
> > > > > > Vadim Gindin
> > > > > >
> > > > >
> > > >
> > >
> >
>

Mime
View raw message