Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 6779 invoked from network); 24 Apr 2007 05:16:38 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Apr 2007 05:16:38 -0000 Received: (qmail 5731 invoked by uid 500); 24 Apr 2007 05:16:43 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 5566 invoked by uid 500); 24 Apr 2007 05:16:43 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 5555 invoked by uid 99); 24 Apr 2007 05:16:42 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Apr 2007 22:16:42 -0700 X-ASF-Spam-Status: No, hits=-98.4 required=10.0 tests=ALL_TRUSTED,GAPPY_SUBJECT X-Spam-Check-By: apache.org Received: from [140.211.11.4] (HELO brutus.apache.org) (140.211.11.4) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 23 Apr 2007 22:16:35 -0700 Received: from brutus (localhost [127.0.0.1]) by brutus.apache.org (Postfix) with ESMTP id 72D13714062 for ; Mon, 23 Apr 2007 22:16:15 -0700 (PDT) Message-ID: <8428834.1177391775467.JavaMail.jira@brutus> Date: Mon, 23 Apr 2007 22:16:15 -0700 (PDT) From: "Doron Cohen (JIRA)" To: java-dev@lucene.apache.org Subject: [jira] Updated: (LUCENE-736) Sloppy Phrase Scorer matches the doc "A B C D E" for query = "B C B"~2 In-Reply-To: <11819638.1164964461059.JavaMail.jira@brutus> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org [ https://issues.apache.org/jira/browse/LUCENE-736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Doron Cohen updated LUCENE-736: ------------------------------- Lucene Fields: [New, Patch Available] (was: [Patch Available, New]) Summary: Sloppy Phrase Scorer matches the doc "A B C D E" for query = "B C B"~2 (was: Sloppy Phrase Scoring Misbehavior) Changing the title to match what we decided to fix here. > Sloppy Phrase Scorer matches the doc "A B C D E" for query = "B C B"~2 > ---------------------------------------------------------------------- > > Key: LUCENE-736 > URL: https://issues.apache.org/jira/browse/LUCENE-736 > Project: Lucene - Java > Issue Type: Bug > Components: Search > Reporter: Doron Cohen > Assigned To: Doron Cohen > Priority: Minor > Attachments: perf-search-new.log, perf-search-orig.log, res-search-new2.log, res-search-orig2.log, sloppy_phrase.patch2.txt, sloppy_phrase.patch3.txt, sloppy_phrase_java.patch.txt, sloppy_phrase_tests.patch.txt > > > This is an extension of https://issues.apache.org/jira/browse/LUCENE-697 > In addition to abnormalities Yonik pointed out in 697, there seem to be other issues with slopy phrase search and scoring. > 1) A phrase with a repeated word would be detected in a document although it is not there. > I.e. document = A B D C E , query = "B C B" would not find this document (as expected), but query "B C B"~2 would find it. > I think that no matter how large the slop is, this document should not be a match. > 2) A document containing both orders of a query, symmetrically, would score differently for the queru and for its reveresed form. > I.e. document = A B C B A would score differently for queries "B C"~2 and "C B"~2, although it is symmetric to both. > I will attach test cases that show both these problems and the one reported by Yonik in 697. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online. --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org