jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From flopsi73 <flo...@flopsi.org>
Subject Scoring question
Date Fri, 05 Sep 2008 13:17:27 GMT

Hi everybody,

i have a question regarding custom scoring:
I want to implement a scoring so that the score of a document is just equal
to the occurences of the terms in the document. No special rules about term
length, ocurrences in other documents etc.

defining that only jcr:content/@jcr:data is indexed, e.g. a document with
content
'This is a test document of jackrabbit scoring mechanism, just a test
document'
should always get a score of 3
with a search
'test scoring'

Does anyone  have an idea on how to achieve this most easily? Is there
already anything? Or if not, which classes are to subclass? Just Scorer and
Weight? I think Similarity is not necessary (see MatchAllScorer)?!? Or maybe
even Query?

I thought about something like this (in a new 'HitScorer' class):

	public float score() throws IOException {
		TermFreqVector tfv = reader.getTermFreqVector(nextDoc, "jcr:content");
		int[] freqs = tfv.getTermFrequencies();
		int sum = 0;
		for (int i = 0; i < freqs.length; i++)
			sum += freqs[i];
		return sum;
	}

But what to do in Weight.getSumOfSquaredWeights and Weight.normalize? Just
1.0f? And is the property name correct? I admit i am a bit confused about
the DefaultSimilarity formula(s)...

Thanks a lot, best regards
Flo

-- 
View this message in context: http://www.nabble.com/Scoring-question-tp19331034p19331034.html
Sent from the Jackrabbit - Dev mailing list archive at Nabble.com.


Mime
View raw message