lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Brusic <i...@brusic.com>
Subject Re: Duplicate values in search
Date Tue, 29 Dec 2015 15:01:45 GMT
Thanks Adrien. I added the BaseScorer to the gist, but I was hoping to
achieve was which direction I should go into to debug this issue. I was not
focusing on the scorers since I did not need to upgrade them and I actually
do not think I ever wrote my one Scorer in Lucene. Taking the next few days
off, so I will get around to looking back into it soon.

Ivan

On Mon, Dec 28, 2015 at 5:41 PM, Adrien Grand <jpountz@gmail.com> wrote:

> Ivan, I can't find the BaseScorer class in the gist. Maybe you forgot to
> git add it?
>
> Le lun. 28 déc. 2015 à 23:07, Ivan Brusic <ivan@brusic.com> a écrit :
>
> > Here is the complete code:
> > https://gist.github.com/brusic/e3018a2e403f5707fa3e
> >
> > The code is not originally mine, so I do not take responsibility. Once I
> > get things to perform correctly, I will do another pass with
> improvements.
> > Much of the custom code needs to be re-thought.
> >
> > The scorer is one class that I did not need to update, so I did not focus
> > on it. Will do so now.
> >
> > Ivan
> >
> > On Mon, Dec 28, 2015 at 4:58 PM, Adrien Grand <jpountz@gmail.com> wrote:
> >
> > > Hi Ivan,
> > >
> > > It looks like your scorer is emitting the same document twice. Maybe
> you
> > > could try to use AssertingIndexSearcher in your test case, this is the
> > kind
> > > of things that it should catch.
> > >
> > > The only related Lucene 5 change that I can think of is that Lucene now
> > > requires docs to be collected in order, did this scorer use to collect
> > docs
> > > out of order in Lucene 4?
> > >
> > > If that still doesn't help and if you can share the code of your
> scorer,
> > I
> > > could give it a quick look.
> > >
> > > Le lun. 28 déc. 2015 à 22:18, Ivan Brusic <ivan@brusic.com> a écrit
:
> > >
> > > > I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of
> custom
> > > > collectors, analyzers, queries, etc.. I have migrated other code
> bases
> > > from
> > > > Lucene before (2->3, 3->4) and I always had one issue I could not
> > > eyeball!
> > > >
> > > > When using a custom query, I get the same document twice in the
> result
> > > set.
> > > > The changes I made for the upgrade had to do with the query/weight
> API
> > > > change.
> > > >
> > > > Without getting in the custom code, here is the simple test case:
> > > >
> > > > @BeforeClass
> > > > public static void buildIndex() throws IOException {
> > > >     ANALYZER = new StandardAnalyzer();
> > > >     IndexWriterConfig config = new IndexWriterConfig(ANALYZER);
> > > >     DIRECTORY = new RAMDirectory();
> > > >     try (IndexWriter writer = new IndexWriter(DIRECTORY, config)) {
> > > >         // removed for brevity
> > > >         // repeated five times with different values
> > > >         Document doc = new Document();
> > > >         doc.add(...);
> > > >         writer.addDocument(doc);
> > > >     }
> > > > }
> > > >
> > > > @Test
> > > > public void testQuery() throws IOException {
> > > >     try (IndexReader reader = DirectoryReader.open(DIRECTORY)) {
> > > >         IndexSearcher searcher = new IndexSearcher(reader);
> > > >
> > > >         PriorityQuery query = new PriorityQuery();
> > > >         query.add(new TermQuery(new Term("foo", "xyz")));
> > > >         query.add(new TermQuery(new Term("bar", "xyz")));
> > > >         query.add(new TermQuery(new Term("baz", "xyz")));
> > > >
> > > >         CheckHits.checkDocIds("Invalid docs", new int[] {4, 2, 0, 3},
> > > > result.scoreDocs);
> > > >
> > > > }
> > > >
> > > > There should be four unique results out of five since the second
> > > > document (docId 1) does not contain the term xyz. The results instead
> > > > contain 5 documents, with the first one repeated twice at the start:
> > > >
> > > > [doc=4 score=1.1976817 shardIndex=0, doc=4 score=1.1976817
> > > > shardIndex=0, doc=2 score=0.63170385 shardIndex=0, doc=0
> > > > score=0.37223506 shardIndex=0, doc=3 score=0.34156355 shardIndex=0]
> > > >
> > > > When using a BooleanQuery, the results are correct, so obviously the
> > > > custom Query is failing somehow. In all my years of Lucene, I never
> > > > had the same document twice. :) Without boring everyone with the
> > > > custom code, what should I be looking for? Just cannot quite spot it.
> > > >
> > > > Cheers,
> > > >
> > > > Ivan
> > > >
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message