Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 48290 invoked from network); 29 Jun 2005 22:39:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 29 Jun 2005 22:39:14 -0000 Received: (qmail 87451 invoked by uid 500); 29 Jun 2005 22:39:08 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 87427 invoked by uid 500); 29 Jun 2005 22:39:07 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 87414 invoked by uid 99); 29 Jun 2005 22:39:07 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 29 Jun 2005 15:39:07 -0700 X-ASF-Spam-Status: No, hits=0.7 required=10.0 tests=FROM_ENDS_IN_NUMS,NO_REAL_NAME X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [216.148.252.100] (HELO w09.bloglines.com) (216.148.252.100) by apache.org (qpsmtpd/0.29) with SMTP; Wed, 29 Jun 2005 15:39:09 -0700 Received: (qmail 1145 invoked by uid 99); 29 Jun 2005 22:39:06 -0000 Message-ID: <1120084746.2484842205.1143.sendItem@bloglines.com> Date: 29 Jun 2005 22:39:06 -0000 From: yahootintin.11533894@bloglines.com To: java-user@lucene.apache.org Subject: Re: Strategy for making short documents not bubble to the top? MIME-Version: 1.0 Content-Type: text/plain;charset="utf-8" Content-Transfer-Encoding: 8bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Hi Jian, Thanks for the reply. The problem with that is it completely ignores document length. A book that mentions "frog" 5 times in its 2,000 pages should be less relevant than a book that mentions "frog" 4 times in its 4 pages. I really want to lower the document length weight instead of removing it completely. Any ideas how to do that? Thanks. --- java-user@lucene.apache.org wrote: Hi, > > I would use pure span or cover density based ranking algorithm which > do not take document length into consideration. (tweaking whatever > currently in the standard Lucene distribution?) > > For example, searching for the keywords "beautiful house", span/cover > ranking will treat a long document and a short document the same > ranking as long as they have the same number of spans/covers (for > example, "beautiful xxxxxx house" is one cover), and with each > span/cover, the editing distance between the keywords is the same. > > Just my 2 cents, > > Cheers, > > Jian > > On 29 Jun 2005 20:30:49 -0000, yahootintin.11533894@bloglines.com > wrote: > > Hi, > > > > Short documents bubble to the top of the results because the field > > length is short. Does anyone have a good strategy for working around this? > > Will doing something like log(document length) flatten out my results while > > still making them meaningful? I'm going to try some different approaches > > but any advice is appreciated. > > > > Thanks. > > > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > > For additional commands, e-mail: java-user-help@lucene.apache.org > > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org