Return-Path: Delivered-To: apmail-jackrabbit-users-archive@locus.apache.org Received: (qmail 12346 invoked from network); 8 Sep 2008 08:27:29 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 8 Sep 2008 08:27:29 -0000 Received: (qmail 21430 invoked by uid 500); 8 Sep 2008 08:27:26 -0000 Delivered-To: apmail-jackrabbit-users-archive@jackrabbit.apache.org Received: (qmail 20957 invoked by uid 500); 8 Sep 2008 08:27:26 -0000 Mailing-List: contact users-help@jackrabbit.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: users@jackrabbit.apache.org Delivered-To: mailing list users@jackrabbit.apache.org Received: (qmail 20946 invoked by uid 99); 8 Sep 2008 08:27:25 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2008 01:27:25 -0700 X-ASF-Spam-Status: No, hits=0.2 required=10.0 tests=SPF_PASS,WHOIS_MYPRIVREG X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [213.133.51.241] (HELO mail.hippo.nl) (213.133.51.241) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 08 Sep 2008 08:26:24 +0000 Content-class: urn:content-classes:message MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: quoted-printable X-MimeOLE: Produced By Microsoft Exchange V6.5 Subject: RE: Scoring question Date: Mon, 8 Sep 2008 10:26:54 +0200 Message-ID: In-Reply-To: <19331007.post@talk.nabble.com> X-MS-Has-Attach: X-MS-TNEF-Correlator: Thread-Topic: Scoring question Thread-Index: AckPWb3anQ8PPFzgTO6zOWJ9LV0VmgCMkXDA References: <19331007.post@talk.nabble.com> From: "Ard Schrijvers" To: X-Virus-Checked: Checked by ClamAV on apache.org Flo, can you please stop crossposting the user and dev list with the same mails. Your mails are clearly user question, so please stick to this list. Furtermore, think you have to take a look at Lucene scoring algorithm if you want this kind of behavior implemented. See [1]. Furthermore, IMHO it seems to be an awkward scoring algorithm you want: a document with 10 words, have 5 times 'jackrabbit' in it would score lower then a document having 10.000 words and 6 times jackrabbit in it. Anyway, the thing you want is lucene expert level (though your algorithm doesn't seem to hard to implement), and off topic on this list, Regards Ard [1] http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apac he/lucene/search/package-summary.html#scoring >=20 > Hi everybody, >=20 > i have a question regarding custom scoring: > I want to implement a scoring so that the score of a document=20 > is just equal to the occurences of the terms in the document.=20 > No special rules about term length, ocurrences in other documents etc. >=20 > defining that only jcr:content/@jcr:data is indexed, e.g. a=20 > document with content 'This is a test document of jackrabbit=20 > scoring mechanism, just a test document' > should always get a score of 3 > with a search > 'test scoring' >=20 > Does anyone have an idea on how to achieve this most easily?=20 > Is there already anything? Or if not, which classes are to=20 > subclass? Just Scorer and Weight? I think Similarity is not=20 > necessary (see MatchAllScorer)?!? Or maybe even Query? >=20 > I thought about something like this (in a new 'HitScorer' class): >=20 > public float score() throws IOException { > TermFreqVector tfv =3D=20 > reader.getTermFreqVector(nextDoc, "jcr:content"); > int[] freqs =3D tfv.getTermFrequencies(); > int sum =3D 0; > for (int i =3D 0; i < freqs.length; i++) > sum +=3D freqs[i]; > return sum; > } >=20 > But what to do in Weight.getSumOfSquaredWeights and=20 > Weight.normalize? Just 1.0f? And is the property name=20 > correct? I admit i am a bit confused about the=20 > DefaultSimilarity formula(s)... >=20 > Thanks a lot, best regards > Flo >=20 > -- > View this message in context:=20 > http://www.nabble.com/Scoring-question-tp19331007p19331007.html > Sent from the Jackrabbit - Users mailing list archive at Nabble.com. >=20 >=20