Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 52936 invoked from network); 10 Feb 2006 08:09:03 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 Feb 2006 08:09:03 -0000 Received: (qmail 42983 invoked by uid 500); 10 Feb 2006 08:08:59 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 42367 invoked by uid 500); 10 Feb 2006 08:08:56 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 42356 invoked by uid 99); 10 Feb 2006 08:08:56 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Feb 2006 00:08:56 -0800 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of clamprecht@gmail.com designates 66.249.92.200 as permitted sender) Received: from [66.249.92.200] (HELO uproxy.gmail.com) (66.249.92.200) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 10 Feb 2006 00:08:56 -0800 Received: by uproxy.gmail.com with SMTP id e2so182719ugf for ; Fri, 10 Feb 2006 00:08:35 -0800 (PST) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references; b=EBeJpl3i+Y4frJMvqp0MwOkH9h49i0CPNV8IH3O8Tf2HflQpJf0d9g7/cL3cGW31O2pgSlCEkoPvG462oLcUmIzUitJRn0hbnZPXjuvcCkLA16FvpCCMKEMZvlqFcY+FzN5hkvnUGYxW/HHenSSq5vp58rasqcUgORPlaWwrZYo= Received: by 10.48.42.20 with SMTP id p20mr2730478nfp; Fri, 10 Feb 2006 00:08:33 -0800 (PST) Received: by 10.49.54.12 with HTTP; Fri, 10 Feb 2006 00:08:33 -0800 (PST) Message-ID: <88c6a6720602100008x55a8f0fat39ad05412fc8ddb3@mail.gmail.com> Date: Fri, 10 Feb 2006 02:08:33 -0600 From: Chris Lamprecht To: java-user@lucene.apache.org Subject: Re: Help: tweaking search - reducing IDF skew and implementing score cutoff In-Reply-To: <6c33f9950602092325m2b2d54e6vaaa0a8b625280f1@mail.gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Content-Disposition: inline References: <6c33f9950602092325m2b2d54e6vaaa0a8b625280f1@mail.gmail.com> X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N > 2. If I choose to sort the results by date, then recent documents with > very very low relevancy (say the words searched appears only in > content, and not in title/bylines/summary fields that are boosted > higher) are still shown relatively high in the list, and I wish to > omit them in general. What is the best way to implement some sort of a > relevancy filter (include only documents with an normalized score of > 0.2 or more....)? Or is there a better way around it? As Chris pointed out, there isn't always an easy way to do this. Your suggestion of filtering below normalized scores of 0.2 might work, assuming the most relevant document is 1.0. You'll have to tune this cutoff point and see how well it works. One thing to watch out for is that if the raw (non-normalized) score is less than 1.0, it is not "normalized", so your most relevant document can have a score of less than 1.0. This may or may not be what you want, just something to consider. Lucene's Hits.java is where the normalization happens. -chris --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org