Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 1927 invoked from network); 24 Aug 2007 06:27:26 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 24 Aug 2007 06:27:26 -0000 Received: (qmail 60983 invoked by uid 500); 24 Aug 2007 06:27:18 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 60338 invoked by uid 500); 24 Aug 2007 06:27:16 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 60327 invoked by uid 99); 24 Aug 2007 06:27:16 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Thu, 23 Aug 2007 23:27:16 -0700 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=SPF_NEUTRAL,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: neutral (athena.apache.org: local policy) Received: from [169.229.70.150] (HELO rescomp.berkeley.edu) (169.229.70.150) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 24 Aug 2007 06:27:10 +0000 Received: by rescomp.berkeley.edu (Postfix, from userid 1007) id 6D84795A15; Thu, 23 Aug 2007 23:26:47 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by rescomp.berkeley.edu (Postfix) with ESMTP id 5E64323701 for ; Thu, 23 Aug 2007 23:26:47 -0700 (PDT) Date: Thu, 23 Aug 2007 23:26:47 -0700 (PDT) From: Chris Hostetter To: java-dev@lucene.apache.org Subject: Re: Request to change "coord" similarity API: In-Reply-To: <94243d370708221319y73acec0bu97c4ad2a58497e25@mail.gmail.com> Message-ID: References: <94243d370708221319y73acec0bu97c4ad2a58497e25@mail.gmail.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org : I'm hoping that coord similarity API can be changed from: : float coord(int overlap, int maxOverlap) ... : float coord(int overlap, int maxOverlap, int docSize) that's a pretty significant change ... especally considering Lucene doesn't know the docSize. you my want to review the comments in another recent related thread that suggested incorperating the average doc Length... http://www.nabble.com/search-quality---assessment---improvements-tf3974580.html#a11701392 : score. Nothing can help here, changing lengthNorm to intentionally lower : the score of car names as they get longer doesn't make sense, the "Volvo V70 : Wagon Luxury Edition Sports Pacakge AWD" is just as much of a car as the the long name may be "just as much of a car" as the short name, but the lengthNorm by itself isn't really important -- it's all relative, the lengthNorm is just there to help offset other factors such as higher tfs and in the case of larger boolean queries: a higher coord factor. Regarding your specific problem: other people have solved this using PhraseQueries with extermely large slop, and sentinal terms indexed at the start and end of their field values. ie... Doc1: _START_ Volvo V70 Wagon _END_ Doc2: _START_ Volvo V70 Wagon Luxury Edition Sports Pacakge AWD _END_ User Input: Volvo V70 Wagon Query: SpanNearQuery(_START_, Volvo, V70, Wagon, _END_, 10000) ...both docs will match, Doc1 will match with a much higher score. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org