lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ivan Brusic <i...@brusic.com>
Subject Re: Duplicate values in search
Date Mon, 28 Dec 2015 22:07:10 GMT
Here is the complete code:
https://gist.github.com/brusic/e3018a2e403f5707fa3e

The code is not originally mine, so I do not take responsibility. Once I
get things to perform correctly, I will do another pass with improvements.
Much of the custom code needs to be re-thought.

The scorer is one class that I did not need to update, so I did not focus
on it. Will do so now.

Ivan

On Mon, Dec 28, 2015 at 4:58 PM, Adrien Grand <jpountz@gmail.com> wrote:

> Hi Ivan,
>
> It looks like your scorer is emitting the same document twice. Maybe you
> could try to use AssertingIndexSearcher in your test case, this is the kind
> of things that it should catch.
>
> The only related Lucene 5 change that I can think of is that Lucene now
> requires docs to be collected in order, did this scorer use to collect docs
> out of order in Lucene 4?
>
> If that still doesn't help and if you can share the code of your scorer, I
> could give it a quick look.
>
> Le lun. 28 déc. 2015 à 22:18, Ivan Brusic <ivan@brusic.com> a écrit :
>
> > I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of custom
> > collectors, analyzers, queries, etc.. I have migrated other code bases
> from
> > Lucene before (2->3, 3->4) and I always had one issue I could not
> eyeball!
> >
> > When using a custom query, I get the same document twice in the result
> set.
> > The changes I made for the upgrade had to do with the query/weight API
> > change.
> >
> > Without getting in the custom code, here is the simple test case:
> >
> > @BeforeClass
> > public static void buildIndex() throws IOException {
> >     ANALYZER = new StandardAnalyzer();
> >     IndexWriterConfig config = new IndexWriterConfig(ANALYZER);
> >     DIRECTORY = new RAMDirectory();
> >     try (IndexWriter writer = new IndexWriter(DIRECTORY, config)) {
> >         // removed for brevity
> >         // repeated five times with different values
> >         Document doc = new Document();
> >         doc.add(...);
> >         writer.addDocument(doc);
> >     }
> > }
> >
> > @Test
> > public void testQuery() throws IOException {
> >     try (IndexReader reader = DirectoryReader.open(DIRECTORY)) {
> >         IndexSearcher searcher = new IndexSearcher(reader);
> >
> >         PriorityQuery query = new PriorityQuery();
> >         query.add(new TermQuery(new Term("foo", "xyz")));
> >         query.add(new TermQuery(new Term("bar", "xyz")));
> >         query.add(new TermQuery(new Term("baz", "xyz")));
> >
> >         CheckHits.checkDocIds("Invalid docs", new int[] {4, 2, 0, 3},
> > result.scoreDocs);
> >
> > }
> >
> > There should be four unique results out of five since the second
> > document (docId 1) does not contain the term xyz. The results instead
> > contain 5 documents, with the first one repeated twice at the start:
> >
> > [doc=4 score=1.1976817 shardIndex=0, doc=4 score=1.1976817
> > shardIndex=0, doc=2 score=0.63170385 shardIndex=0, doc=0
> > score=0.37223506 shardIndex=0, doc=3 score=0.34156355 shardIndex=0]
> >
> > When using a BooleanQuery, the results are correct, so obviously the
> > custom Query is failing somehow. In all my years of Lucene, I never
> > had the same document twice. :) Without boring everyone with the
> > custom code, what should I be looking for? Just cannot quite spot it.
> >
> > Cheers,
> >
> > Ivan
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message