Return-Path: X-Original-To: apmail-lucene-dev-archive@www.apache.org Delivered-To: apmail-lucene-dev-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D7B279288 for ; Mon, 5 Mar 2012 19:26:20 +0000 (UTC) Received: (qmail 58214 invoked by uid 500); 5 Mar 2012 19:26:19 -0000 Delivered-To: apmail-lucene-dev-archive@lucene.apache.org Received: (qmail 58139 invoked by uid 500); 5 Mar 2012 19:26:19 -0000 Mailing-List: contact dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@lucene.apache.org Delivered-To: mailing list dev@lucene.apache.org Received: (qmail 58080 invoked by uid 99); 5 Mar 2012 19:26:19 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 19:26:19 +0000 X-ASF-Spam-Status: No, hits=-2000.0 required=5.0 tests=ALL_TRUSTED,T_RP_MATCHES_RCVD X-Spam-Check-By: apache.org Received: from [140.211.11.116] (HELO hel.zones.apache.org) (140.211.11.116) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 05 Mar 2012 19:26:18 +0000 Received: from hel.zones.apache.org (hel.zones.apache.org [140.211.11.116]) by hel.zones.apache.org (Postfix) with ESMTP id 18623AA5D for ; Mon, 5 Mar 2012 19:25:58 +0000 (UTC) Date: Mon, 5 Mar 2012 19:25:58 +0000 (UTC) From: "Doron Cohen (Updated) (JIRA)" To: dev@lucene.apache.org Message-ID: <661816229.23472.1330975558101.JavaMail.tomcat@hel.zones.apache.org> In-Reply-To: <1961499182.12192.1330035830036.JavaMail.tomcat@hel.zones.apache.org> Subject: [jira] [Updated] (LUCENE-3821) SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds. MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-JIRA-FingerPrint: 30527f35849b9dde25b450d4833f0394 X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-3821: -------------------------------- Attachment: LUCENE-3821.patch Attached updated patch. Repeating PPs with multi-Phrase-query is handled as well. This called for more cases in the sloppy phrase scorer and more code, and, although I think the code is cleaner now, I don't know to what extent is it easier to maintain. It definitely fixes wrong behavior that exists in current 3x and trunk (patch is for 3x). However, although the random test passes for me even with -Dtests.iter=2000, it is possible to "break the scorer" - that is, create a document and a query which should match each other but would not. The patch adds just such a case as an @Ignored test case: TestMultiPhraseQuery.testMultiSloppyWithRepeats(). I don't see how to solve this specific case in the context of current sloppy phrase scorer. So there are 3 options: # leave things as they are # commit this patch and for now document the failing scenario (also keep it in the ignored test case). # devise a different algorithm for this. I would love it to be the 3rd if I just knew how to do it. Otherwise I like the 2nd, just need to keep in mind that the random test might from time to time create this scenario and so there will be noise in the test builds. Preferences? > SloppyPhraseScorer sometimes misses documents that ExactPhraseScorer finds. > --------------------------------------------------------------------------- > > Key: LUCENE-3821 > URL: https://issues.apache.org/jira/browse/LUCENE-3821 > Project: Lucene - Java > Issue Type: Bug > Affects Versions: 3.5, 4.0 > Reporter: Naomi Dushay > Assignee: Doron Cohen > Attachments: LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821.patch, LUCENE-3821_test.patch, schema.xml, solrconfig-test.xml > > > The general bug is a case where a phrase with no slop is found, > but if you add slop its not. > I committed a test today (TestSloppyPhraseQuery2) that actually triggers this case, > jenkins just hasn't had enough time to chew on it. > ant test -Dtestcase=TestSloppyPhraseQuery2 -Dtests.iter=100 is enough to make it fail on trunk or 3.x -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscribe@lucene.apache.org For additional commands, e-mail: dev-help@lucene.apache.org