jackrabbit-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ard Schrijvers" <a.schrijv...@onehippo.com>
Subject RE: Scoring question
Date Mon, 08 Sep 2008 08:26:54 GMT
Flo, can you please stop crossposting the user and dev list with the
same mails. Your mails are clearly user question, so please stick to
this list.

Furtermore, think you have to take a look at Lucene scoring algorithm if
you want this kind of behavior implemented. See [1].

Furthermore, IMHO it seems to be an awkward scoring algorithm you want:
a document with 10 words, have 5 times 'jackrabbit' in it would score
lower then a document having 10.000 words and 6 times jackrabbit in it.

Anyway, the thing you want is lucene expert level (though your algorithm
doesn't seem to hard to implement), and off topic on this list,

Regards Ard

[1]
http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apac
he/lucene/search/package-summary.html#scoring

> 
> Hi everybody,
> 
> i have a question regarding custom scoring:
> I want to implement a scoring so that the score of a document 
> is just equal to the occurences of the terms in the document. 
> No special rules about term length, ocurrences in other documents etc.
> 
> defining that only jcr:content/@jcr:data is indexed, e.g. a 
> document with content 'This is a test document of jackrabbit 
> scoring mechanism, just a test document'
> should always get a score of 3
> with a search
> 'test scoring'
> 
> Does anyone  have an idea on how to achieve this most easily? 
> Is there already anything? Or if not, which classes are to 
> subclass? Just Scorer and Weight? I think Similarity is not 
> necessary (see MatchAllScorer)?!? Or maybe even Query?
> 
> I thought about something like this (in a new 'HitScorer' class):
> 
> 	public float score() throws IOException {
> 		TermFreqVector tfv = 
> reader.getTermFreqVector(nextDoc, "jcr:content");
> 		int[] freqs = tfv.getTermFrequencies();
> 		int sum = 0;
> 		for (int i = 0; i < freqs.length; i++)
> 			sum += freqs[i];
> 		return sum;
> 	}
> 
> But what to do in Weight.getSumOfSquaredWeights and 
> Weight.normalize? Just 1.0f? And is the property name 
> correct? I admit i am a bit confused about the 
> DefaultSimilarity formula(s)...
> 
> Thanks a lot, best regards
> Flo
> 
> --
> View this message in context: 
> http://www.nabble.com/Scoring-question-tp19331007p19331007.html
> Sent from the Jackrabbit - Users mailing list archive at Nabble.com.
> 
> 

Mime
View raw message