lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Doron Cohen (JIRA)" <j...@apache.org>
Subject [jira] [Updated] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats
Date Tue, 06 Sep 2011 07:44:10 GMT

     [ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Doron Cohen updated LUCENE-3412:
--------------------------------

    Attachment: LUCENE-3412.patch

I am able to see this inconsistent behavior!

Attached patch contains a test that fails on this. The test currently prints the trial number,
and the first loop always pass in all 30 trials (expected) while the second loop always fail
(for me) but is inconsistent about when it fails. Sometimes, it fails on the first iteration.
Some other times it fails on the 3rd, 9th, etc.

Quite peculiar... investigating...

> SloppyPhraseScorer returns non-deterministic results for queries with many repeats
> ----------------------------------------------------------------------------------
>
>                 Key: LUCENE-3412
>                 URL: https://issues.apache.org/jira/browse/LUCENE-3412
>             Project: Lucene - Java
>          Issue Type: Bug
>          Components: core/search
>    Affects Versions: 3.1, 3.2, 3.3, 4.0
>            Reporter: Michael Ryan
>            Assignee: Doron Cohen
>         Attachments: LUCENE-3412.patch
>
>
> Proximity queries with many repeats (four or more, based on my testing) return non-deterministic
results. I run the same query multiple times with the same data set and get different results.
> So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 trunk.
> Steps to reproduce (using the Solr example):
> 1) In solrconfig.xml, set queryResultCache size to 0.
> 2) Add some documents with text "dog dog dog" and "dog dog dog dog". http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true
> 3) Do a "dog dog dog dog"~1 query. http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1
> 4) Repeat step 3 many times.
> Expected results: The document with id 2 should be returned.
> Actual results: The document with id 2 is always returned. The document with id 1 is
sometimes returned.
> Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog dog dog"~100,
etc show the same behavior.
> So far I've traced it down to the "repeats" array in SloppyPhraseScorer.initPhrasePositions()
- depending on the order of the elements in this array, the document may or may not match.
I think the HashSet may be to blame, but I'm not sure - that at least seems to be where the
non-determinism is coming from.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: dev-help@lucene.apache.org


Mime
View raw message