Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 17C9E7F4A for ; Fri, 2 Sep 2011 21:21:35 +0000 (UTC) Received: (qmail 85256 invoked by uid 500); 2 Sep 2011 21:21:33 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 84979 invoked by uid 500); 2 Sep 2011 21:21:32 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 84969 invoked by uid 99); 2 Sep 2011 21:21:32 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 21:21:32 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 21:21:31 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 9D31A4C418 for ; Fri, 2 Sep 2011 21:21:11 +0000 (UTC) Date: Fri, 2 Sep 2011 21:21:11 +0000 (UTC) From: "Robert Muir (JIRA)" To: dev@lucene.apache.org Message-ID: <92980719.12622.1314998471640.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <633793445.12602.1314997749985.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Commented] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 [ https://issues.apache.org/jira/browse/LUCENE-3412?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13096308#comment-13096308 ] Robert Muir commented on LUCENE-3412: ------------------------------------- This issue could also be related to LUCENE-3215: in some cases with repeats, sloppy phrasescorer returns scores of Infinity... what scores are you getting? However, I don't think its a duplicate issue, with LUCENE-3215 the issue is when you have sloppyphrasequery + repeats + positionIncrements > 1 (e.g. stopwords and enablePositionIncrements=true, the default) > SloppyPhraseScorer returns non-deterministic results for queries with many repeats > ---------------------------------------------------------------------------------- > > Key: LUCENE-3412 > URL: https://issues.apache.org/jira/browse/LUCENE-3412 > Project: Lucene - Java > Issue Type: Bug > Components: core/search > Affects Versions: 3.1, 3.2, 3.3, 4.0 > Reporter: Michael Ryan > > Proximity queries with many repeats (four or more, based on my testing) return non-deterministic results. I run the same query multiple times with the same data set and get different results. > So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 trunk. > Steps to reproduce (using the Solr example): > 1) In solrconfig.xml, set queryResultCache size to 0. > 2) Add some documents with text "dog dog dog" and "dog dog dog dog". http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true > 3) Do a "dog dog dog dog"~1 query. http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 > 4) Repeat step 3 many times. > Expected results: The document with id 2 should be returned. > Actual results: The document with id 2 is always returned. The document with id 1 is sometimes returned. > Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog dog dog"~100, etc show the same behavior. > So far I've traced it down to the "repeats" array in SloppyPhraseScorer.initPhrasePositions() - depending on the order of the elements in this array, the document may or may not match. I think the HashSet may be to blame, but I'm not sure - that at least seems to be where the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org