lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Adrien Grand <jpou...@gmail.com>
Subject Re: Duplicate values in search
Date Mon, 28 Dec 2015 22:41:47 GMT
Ivan, I can't find the BaseScorer class in the gist. Maybe you forgot to
git add it?

Le lun. 28 déc. 2015 à 23:07, Ivan Brusic <ivan@brusic.com> a écrit :

> Here is the complete code:
> https://gist.github.com/brusic/e3018a2e403f5707fa3e
>
> The code is not originally mine, so I do not take responsibility. Once I
> get things to perform correctly, I will do another pass with improvements.
> Much of the custom code needs to be re-thought.
>
> The scorer is one class that I did not need to update, so I did not focus
> on it. Will do so now.
>
> Ivan
>
> On Mon, Dec 28, 2015 at 4:58 PM, Adrien Grand <jpountz@gmail.com> wrote:
>
> > Hi Ivan,
> >
> > It looks like your scorer is emitting the same document twice. Maybe you
> > could try to use AssertingIndexSearcher in your test case, this is the
> kind
> > of things that it should catch.
> >
> > The only related Lucene 5 change that I can think of is that Lucene now
> > requires docs to be collected in order, did this scorer use to collect
> docs
> > out of order in Lucene 4?
> >
> > If that still doesn't help and if you can share the code of your scorer,
> I
> > could give it a quick look.
> >
> > Le lun. 28 déc. 2015 à 22:18, Ivan Brusic <ivan@brusic.com> a écrit :
> >
> > > I just migrated on ton of code from Lucene 4.10 to 5.4. Lots of custom
> > > collectors, analyzers, queries, etc.. I have migrated other code bases
> > from
> > > Lucene before (2->3, 3->4) and I always had one issue I could not
> > eyeball!
> > >
> > > When using a custom query, I get the same document twice in the result
> > set.
> > > The changes I made for the upgrade had to do with the query/weight API
> > > change.
> > >
> > > Without getting in the custom code, here is the simple test case:
> > >
> > > @BeforeClass
> > > public static void buildIndex() throws IOException {
> > >     ANALYZER = new StandardAnalyzer();
> > >     IndexWriterConfig config = new IndexWriterConfig(ANALYZER);
> > >     DIRECTORY = new RAMDirectory();
> > >     try (IndexWriter writer = new IndexWriter(DIRECTORY, config)) {
> > >         // removed for brevity
> > >         // repeated five times with different values
> > >         Document doc = new Document();
> > >         doc.add(...);
> > >         writer.addDocument(doc);
> > >     }
> > > }
> > >
> > > @Test
> > > public void testQuery() throws IOException {
> > >     try (IndexReader reader = DirectoryReader.open(DIRECTORY)) {
> > >         IndexSearcher searcher = new IndexSearcher(reader);
> > >
> > >         PriorityQuery query = new PriorityQuery();
> > >         query.add(new TermQuery(new Term("foo", "xyz")));
> > >         query.add(new TermQuery(new Term("bar", "xyz")));
> > >         query.add(new TermQuery(new Term("baz", "xyz")));
> > >
> > >         CheckHits.checkDocIds("Invalid docs", new int[] {4, 2, 0, 3},
> > > result.scoreDocs);
> > >
> > > }
> > >
> > > There should be four unique results out of five since the second
> > > document (docId 1) does not contain the term xyz. The results instead
> > > contain 5 documents, with the first one repeated twice at the start:
> > >
> > > [doc=4 score=1.1976817 shardIndex=0, doc=4 score=1.1976817
> > > shardIndex=0, doc=2 score=0.63170385 shardIndex=0, doc=0
> > > score=0.37223506 shardIndex=0, doc=3 score=0.34156355 shardIndex=0]
> > >
> > > When using a BooleanQuery, the results are correct, so obviously the
> > > custom Query is failing somehow. In all my years of Lucene, I never
> > > had the same document twice. :) Without boring everyone with the
> > > custom code, what should I be looking for? Just cannot quite spot it.
> > >
> > > Cheers,
> > >
> > > Ivan
> > >
> >
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message