Return-Path: Delivered-To: apmail-jakarta-lucene-dev-archive@apache.org Received: (qmail 35496 invoked from network); 9 Jul 2003 20:44:17 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 9 Jul 2003 20:44:17 -0000 Received: (qmail 4022 invoked by uid 97); 9 Jul 2003 20:46:48 -0000 Delivered-To: qmlist-jakarta-archive-lucene-dev@nagoya.betaversion.org Received: (qmail 4015 invoked from network); 9 Jul 2003 20:46:48 -0000 Received: from daedalus.apache.org (HELO apache.org) (208.185.179.12) by nagoya.betaversion.org with SMTP; 9 Jul 2003 20:46:48 -0000 Received: (qmail 35207 invoked by uid 500); 9 Jul 2003 20:44:14 -0000 Mailing-List: contact lucene-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Lucene Developers List" Reply-To: "Lucene Developers List" Delivered-To: mailing list lucene-dev@jakarta.apache.org Received: (qmail 35195 invoked from network); 9 Jul 2003 20:44:14 -0000 Received: from exchange.sun.com (192.18.33.10) by daedalus.apache.org with SMTP; 9 Jul 2003 20:44:14 -0000 Received: (qmail 4008 invoked by uid 50); 9 Jul 2003 20:46:45 -0000 Date: 9 Jul 2003 20:46:45 -0000 Message-ID: <20030709204645.4007.qmail@nagoya.betaversion.org> From: bugzilla@apache.org To: lucene-dev@jakarta.apache.org Cc: Subject: DO NOT REPLY [Bug 21446] New: - Fuzzy Searches do not get a boost of 0.2 as stated in "Query Syntax" doc X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT . ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=21446 Fuzzy Searches do not get a boost of 0.2 as stated in "Query Syntax" doc Summary: Fuzzy Searches do not get a boost of 0.2 as stated in "Query Syntax" doc Product: Lucene Version: 1.2 Platform: All OS/Version: All Status: NEW Severity: Normal Priority: Other Component: Search AssignedTo: lucene-dev@jakarta.apache.org ReportedBy: cormac@siderean.com According to the website's "Query Syntax" page, fuzzy searches are given a boost of 0.2. I've found this not to be the case, and have seen situations where exact matches have lower relevance scores than fuzzy matches. Rather than getting a boost of 0.2, it appears that all variations on the term are first found in the model, where dist* > 0.5. * dist = levenshteinDistance / length of min(termlength, variantlength) This then leads to a boolean OR search of all the variant terms, each of whose boost is set to (dist - 0.5)*2 for that variant. The upshot of all of this is that there are many cases where a fuzzy match will get a higher relevance score than an exact match. See this email for a test case to reproduce this anomalous behaviour. http://www.mail-archive.com/lucene-dev@jakarta.apache.org/msg02819.html Here is a candidate patch to address the issue - *** lucene-1.2\src\java\org\apache\lucene\search\FuzzyTermEnum.java Sun Jun 09 13:47:54 2002 --- lucene-1.2-modified\src\java\org\apache\lucene\search\FuzzyTermEnum.java Fri Mar 14 11:37:20 2003 *************** *** 99,105 **** } final protected float difference() { ! return (float)((distance - FUZZY_THRESHOLD) * SCALE_FACTOR); } final public boolean endEnum() { --- 99,109 ---- } final protected float difference() { ! if (distance == 1.0) { ! return 1.0f; ! } ! else ! return (float)((distance - FUZZY_THRESHOLD) * SCALE_FACTOR); } final public boolean endEnum() { *************** *** 111,117 **** ******************************/ public static final double FUZZY_THRESHOLD = 0.5; ! public static final double SCALE_FACTOR = 1.0f / (1.0f - FUZZY_THRESHOLD); /** Finds and returns the smallest of three integers --- 115,121 ---- ******************************/ public static final double FUZZY_THRESHOLD = 0.5; ! public static final double SCALE_FACTOR = 0.2f * (1.0f / (1.0f - FUZZY_THRESHOLD)); /** Finds and returns the smallest of three integers --------------------------------------------------------------------- To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org For additional commands, e-mail: lucene-dev-help@jakarta.apache.org