jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ard Schrijvers <a.schrijv...@onehippo.com>
Subject Re: Confusion regarding order of search results
Date Wed, 24 Jun 2009 14:18:32 GMT
On Wed, Jun 24, 2009 at 3:32 PM, Marcel Reutegger
<marcel.reutegger@gmx.net>wrote:

> Hi,
>
> On Wed, Jun 24, 2009 at 14:47, Ard Schrijvers<a.schrijvers@onehippo.com>
> wrote:
> > Hello,
> >
> > I am confused regarding the order of search results (jr 1.5.2 core).
> > First of all, I have configured <param name="respectDocumentOrder"
> > value="false"/>.
> >
> > Thus, I would expect if I do someting like:
> >
> > //*[jcr:contains(.,'foo')]
> >
> > that my ordering would be by @jcr:score (lucene score) descending.
>
> no, it just means they the result nodes aren't necessarily in document
> order.



what do you mean 'no'...I said I would expect it, and IMO it makes sense. If
in lucene you do not specify an order, you'll get back the highest scores
first.


>
>
> > If
> > I print the scores, they seem instead to be random. So, this is not
> > what i would expect.
> >
> > Secondly, when I do
> >
> > //*[jcr:contains(.,'foo')]  order by @jcr:score
> >
> > it makes sense to get results back in descending score order.
>
> why? that's contrary to how order by is defined.



Yeah I understand, and I think the definition would have been way better if
it would have made an exception for the score: Who ever want to have the
results with the lowest score first? Those queries must be really rare imo.


>
>
> > Unfortunately, they are in ascending order. This seems to be inline
> > with the spec, 6.6.3.5 Ordering Specifier ('If neither ascending nor
> > descending is specified after a property name (or jcr:score(...)
> > function), the default is ascending.'), but, it does not make sense
> > for the jcr:score. It is strange.
>
> no, it isn't :) it's just what you requested. order the result by
> their score value in ascending order.


If I do not specify an order, I get the lowest score first. I try to say
that that is really an odd default for score. I think this is quite
straightforward (furthermore, though I must check it,it is if i recall
correctly, far more expensive with a lucene query to get the lowest scores
instead of the highest)


>
>
> > And beyond that, it will lead to
> > really strange behavior in combination with setLimit i think: IIUC,
> > order by @jcr:score is just the default lucene scoring.
>
> no, that's not correct. order by @jcr:score descending is the default
> lucene scoring.
>

Sry, I was not clear enough: what I am saying, is that if in SearchIndex you
do:

hits = searcher.search(query);

then afaik, you'll get a Hits object in lucene sorted on score (descending,
as this makes sense as a default). For some reason, the ordering when I get
back the nodes is shuffled. I don't understand why. An order by score would
make a perfect default if respectDocumentOrder is set to false.


>
> > This means, if
> > I sort on 'order by @jcr:score' then, the first hit (lowest score)
> > depends on my setLimit. If I do setLimit(1), I get the first
> > authorized lucene hit, which has the highest score possible.
>
> are you sure, this is the case? if yes, then this is a bug and should
> be fixed. it should return the least relevant node.


I was not sure, I only did it by reasoning, but now i see:

if (NameConstants.JCR_SCORE.equals(orderProps[i])) {
                // order on jcr:score does not use the natural order as
                // implemented in lucene. score ascending in lucene means
that
                // higher scores are first. JCR specs that lower score
values
                // are first.
                sortFields.add(new SortField(null, SortField.SCORE,
orderSpecs[i]));
            }


so my claim is incorrect


>
> > If I do
> > setLimit(1000000000) I first get the lowest score as spec defaults to
> > inverting (ascending) the lucene order. So the combination jcr:score
> > which defaults to ascending, is not usefull imo, and with a setLimit()
> > returns quite unexpected results.
> >
> > I am not sure whether jsr-283 has some changes regarding this?
>
> no, this is still the same. higher score value means more relevant,
> hence you have to sort descending to get the most relevant first.


Perhaps a matter of taste...I would have always opted for the highest scores
first as a default, and thus make an explicit exception for score.

Bottom line, I'll default to adding 'order by @jcr:score descending' when no
order is specified :-))

thx for your replies Marcel

Ard


>
> regards
>  marcel
>

Mime
View raw message