Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 70222 invoked from network); 26 Sep 2006 07:11:24 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 26 Sep 2006 07:11:24 -0000 Received: (qmail 17502 invoked by uid 500); 26 Sep 2006 07:11:23 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 16878 invoked by uid 500); 26 Sep 2006 07:11:21 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 16867 invoked by uid 99); 26 Sep 2006 07:11:21 -0000 Received: from idunn.apache.osuosl.org (HELO idunn.apache.osuosl.org) (140.211.166.84) by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 26 Sep 2006 00:11:21 -0700 Authentication-Results: idunn.apache.osuosl.org smtp.mail=markharw00d@yahoo.co.uk; spf=permerror Authentication-Results: idunn.apache.osuosl.org header.from=markharw00d@yahoo.co.uk; domainkeys=bad X-ASF-Spam-Status: No, hits=1.9 required=5.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST Received-SPF: error (idunn.apache.osuosl.org: domain yahoo.co.uk from 217.12.11.96 cause and error) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 Received: from [217.12.11.96] ([217.12.11.96:32622] helo=smtp007.mail.ukl.yahoo.com) by idunn.apache.osuosl.org (ecelerity 2.1.1.8 r(12930)) with ESMTP id 89/E1-05603-792D8154 for ; Tue, 26 Sep 2006 00:11:19 -0700 Received: (qmail 323 invoked from network); 26 Sep 2006 07:11:16 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.co.uk; h=Received:Message-ID:Date:From:User-Agent:MIME-Version:To:Subject:References:In-Reply-To:Content-Type:Content-Transfer-Encoding; b=H/uonYAeYu8moPxTANz0KUw8Vp+Mlb0m3mjEZtJrkcYGFTehUo4YvLvmPsFQLdgqTHI910aFmJK/glFWPkOfLBAl02ncYY34VLCukBRdQ+NcsBTXbVb/Ooi748NOg3XARnto4bGxQDVFxqt0LIqVIASgnr1hIFOV612xRzH5u0s= ; Received: from unknown (HELO ?127.0.0.1?) (markharw00d@194.106.34.5 with plain) by smtp007.mail.ukl.yahoo.com with SMTP; 26 Sep 2006 07:11:15 -0000 Message-ID: <4518D290.9030205@yahoo.co.uk> Date: Tue, 26 Sep 2006 08:11:12 +0100 From: markharw00d User-Agent: Thunderbird 1.5.0.7 (Windows/20060909) MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: highlight - scoring fragments with more of the same token References: In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N If you were to score repeated terms then I suspect it would have to be done so that the repetitions didn't score as highly as the first occurrence - otherwise f2 could be selected as a better fragment than f3 for the query q1 in your example. Repetitions of a term in a fragment could be scored as a very small fraction of the score given to the first occurrence. This would at least rank f2 higher than f1 for query q2. Another potentially useful ranking factor may be to boost fragments found at the beginning of a document - that's where people tend to write summaries or introductions. Doron Cohen wrote: > This question was raised in the user's list - > http://www.nabble.com/highlighting-tf2322109.html > > Assume three fragments and two queries: > f1 = aa 11 bb 33 cc > f2 = aa 11 bb 11 cc > f3 = aa 11 bb 22 cc > q1 = 11 22 > q2 = 11 > Now we call highlighter.getBestFragment(q); > For q1, f3 is returned, as expected. > For q2, f1 is returned, although "11" appears twice in f2 but only once in > f1. > > This is because QueryScorer.getTokenScore(Token) counts only unique > fragment tokens. > > Would it make sense to make this behavior controllable? > (It is easily done but I am not sure about the consequences.) > > Or perhaps there is a way to achieve this behavior (preferring f2 on f1 for > q2 above) that I missed? > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-dev-help@lucene.apache.org > > > > ___________________________________________________________ Copy addresses and emails from any email account to Yahoo! Mail - quick, easy and free. http://uk.docs.yahoo.com/trueswitch2.html --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org