Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id 181257721 for ; Fri, 2 Sep 2011 21:09:34 +0000 (UTC) Received: (qmail 67968 invoked by uid 500); 2 Sep 2011 21:09:32 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 67856 invoked by uid 500); 2 Sep 2011 21:09:31 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 67847 invoked by uid 99); 2 Sep 2011 21:09:31 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 21:09:31 +0000 X-ASF-Spam-Status: No, hits=-2000.5 required=5.0 tests=ALL_TRUSTED,RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 02 Sep 2011 21:09:30 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id F14A64CF75 for ; Fri, 2 Sep 2011 21:09:09 +0000 (UTC) Date: Fri, 2 Sep 2011 21:09:09 +0000 (UTC) From: "Michael Ryan (JIRA)" To: dev@lucene.apache.org Message-ID: <633793445.12602.1314997749985.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Created] (LUCENE-3412) SloppyPhraseScorer returns non-deterministic results for queries with many repeats MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 SloppyPhraseScorer returns non-deterministic results for queries with many repeats ---------------------------------------------------------------------------------- Key: LUCENE-3412 URL: https://issues.apache.org/jira/browse/LUCENE-3412 Project: Lucene - Java Issue Type: Bug Components: core/search Affects Versions: 3.3, 3.2, 3.1, 4.0 Reporter: Michael Ryan Proximity queries with many repeats (four or more, based on my testing) return non-deterministic results. I run the same query multiple times with the same data set and get different results. So far I've reproduced this with Solr 1.4.1, 3.1, 3.2, 3.3, and latest 4.0 trunk. Steps to reproduce (using the Solr example): 1) In solrconfig.xml, set queryResultCache size to 0. 2) Add some documents with text "dog dog dog" and "dog dog dog dog". http://localhost:8983/solr/update?stream.body=%3Cadd%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E1%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%3C/field%3E%3C/doc%3E%3Cdoc%3E%3Cfield%20name=%22id%22%3E2%3C/field%3E%3Cfield%20name=%22text%22%3Edog%20dog%20dog%20dog%3C/field%3E%3C/doc%3E%3C/add%3E&commit=true 3) Do a "dog dog dog dog"~1 query. http://localhost:8983/solr/select?q=%22dog%20dog%20dog%20dog%22~1 4) Repeat step 3 many times. Expected results: The document with id 2 should be returned. Actual results: The document with id 2 is always returned. The document with id 1 is sometimes returned. Different proximity values show the same bug - "dog dog dog dog"~5, "dog dog dog dog"~100, etc show the same behavior. So far I've traced it down to the "repeats" array in SloppyPhraseScorer.initPhrasePositions() - depending on the order of the elements in this array, the document may or may not match. I think the HashSet may be to blame, but I'm not sure - that at least seems to be where the non-determinism is coming from. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org