jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Martin Perez <mper...@gmail.com>
Subject Re: Jackrabbit search score algorithm
Date Mon, 23 Jan 2006 10:52:15 GMT
Thanks for the quick fix Marcel. This afternoon I'll check if it works with
my queries. I suppose that it will work as they are simple ones.

Martin

On 1/23/06, Marcel Reutegger <marcel.reutegger@gmx.net> wrote:
>
> ah, now that I'm using a more complex path constraint I also get scores
> that are always 1000.
>
> this is due to the current implementation of the descendant-or-self axis
> in jackrabbit. it does not propagate the score of the sub query.
>
> this is now fixed in svn revision: 371520
>
> thanks for reporting this issue.
>
> please note that there might be other query statements where the score
> value is not propaged to the final query result.
> e.g:
> //*[jcr:contains(.,'foo')]/bar
>
> will not return the proper score for 'foo' but just 1000 for the node
> name match 'bar'. the reason is basically efficiency. but if there is a
> need to support more sofisticated scoring for such queries we can
> implement this at the cost of slightly slower queries.
>
> regards
>   marcel
>
> Martin Perez wrote:
> > I'm using this query:
> >
> > statement = "/jcr:root"+ a path
> > +"//element(*,nt:resource)[jcr:contains(.,'phrase')]"
> >
> > And for getting the score:
> >
> >             NodeIterator nodeIterator = result.getNodes();
> >             while (it.hasNext()) {
> >                 javax.jcr.query.Row row = (javax.jcr.query.Row)
> it.nextRow
> > ();
> >                 javax.jcr.Node node = (javax.jcr.Node
> > )nodeIterator.nextNode();
> >                 double score = row.getValue(JCRConstants.JCR_SCORE
> > ).getDouble();
> >
> > I think the query is ok. In fact, returns several hits. But all of them
> have
> > an score of 1000.
> >
> > Could I am getting the score wrongly? Maybe could it be due to my
> jackrabbit
> > version?
> >
> > Martin
> >
> > On 1/23/06, Marcel Reutegger <marcel.reutegger@gmx.net> wrote:
> >
> >>Jackrabbit uses the default lucene algorithm [1] to calculate the score
> >>for a jcr:contains clause. any other query element will usually return a
> >>score of 1000.
> >>
> >>a quick test showed the following for the query:
> >>
> >>//*[jcr:contains(.,'apache')] order by @jcr:score descending
> >>
> >>jcr:score  |  text property
> >>----------------------------------------------------------------------
> >>1000       | "Apache Jackrabbit"
> >>848        | "some test jackrabbit apache, apache is great"
> >>350        | "this is a text that is much larger than the first one" +
> >>              "and only contains the word apache once."
> >>
> >>regards
> >>  marcel
> >>
> >>[1]
> >>
> >>
> http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.html
> >>
> >>
> >>Martin Perez wrote:
> >>
> >>>Hi.
> >>>
> >>>I'm searching some words on jackrabbit (a month ago release, sorry if
> >>
> >>this
> >>
> >>>havs changed) string properties and binary content, and every results
> >>
> >>come
> >>
> >>>with a jcr:score of 1000.
> >>>
> >>>What is the followed algorithm? is that result ok? I was expecting
> >>
> >>something
> >>
> >>>like an score based on the occurrences or something similar.
> >>>
> >>>Martin
> >>>
> >>
> >>
> >
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message