lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Duplicate values in search
Date Mon, 28 Dec 2015 21:58:59 GMT
Hi Ivan,

It looks like your scorer is emitting the same document twice. Maybe you
could try to use AssertingIndexSearcher in your test case, this is the kind
of things that it should catch.

The only related Lucene 5 change that I can think of is that Lucene now
requires docs to be collected in order, did this scorer use to collect docs
out of order in Lucene 4?

If that still doesn't help and if you can share the code of your scorer, I
could give it a quick look.

Le lun. 28 déc. 2015 à 22:18, Ivan Brusic <ivan@brusic.com> a écrit :

> I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of custom
> collectors, analyzers, queries, etc.. I have migrated other code bases from
> Lucene before (2->3, 3->4) and I always had one issue I could not eyeball!
>
> When using a custom query, I get the same document twice in the result set.
> The changes I made for the upgrade had to do with the query/weight API
> change.
>
> Without getting in the custom code, here is the simple test case:
>
> @BeforeClass
> public static void buildIndex() throws IOException {
>     ANALYZER = new StandardAnalyzer();
>     IndexWriterConfig config = new IndexWriterConfig(ANALYZER);
>     DIRECTORY = new RAMDirectory();
>     try (IndexWriter writer = new IndexWriter(DIRECTORY, config)) {
>         // removed for brevity
>         // repeated five times with different values
>         Document doc = new Document();
>         doc.add(...);
>         writer.addDocument(doc);
>     }
> }
>
> @Test
> public void testQuery() throws IOException {
>     try (IndexReader reader = DirectoryReader.open(DIRECTORY)) {
>         IndexSearcher searcher = new IndexSearcher(reader);
>
>         PriorityQuery query = new PriorityQuery();
>         query.add(new TermQuery(new Term("foo", "xyz")));
>         query.add(new TermQuery(new Term("bar", "xyz")));
>         query.add(new TermQuery(new Term("baz", "xyz")));
>
>         CheckHits.checkDocIds("Invalid docs", new int[] {4, 2, 0, 3},
> result.scoreDocs);
>
> }
>
> There should be four unique results out of five since the second
> document (docId 1) does not contain the term xyz. The results instead
> contain 5 documents, with the first one repeated twice at the start:
>
> [doc=4 score=1.1976817 shardIndex=0, doc=4 score=1.1976817
> shardIndex=0, doc=2 score=0.63170385 shardIndex=0, doc=0
> score=0.37223506 shardIndex=0, doc=3 score=0.34156355 shardIndex=0]
>
> When using a BooleanQuery, the results are correct, so obviously the
> custom Query is failing somehow. In all my years of Lucene, I never
> had the same document twice. :) Without boring everyone with the
> custom code, what should I be looking for? Just cannot quite spot it.
>
> Cheers,
>
> Ivan
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message