jackrabbit-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Marcel Reutegger <marcel.reuteg...@gmx.net>
Subject Re: Confusion regarding order of search results
Date Thu, 25 Jun 2009 07:16:04 GMT
Hi,

2009/6/24 Ard Schrijvers <a.schrijvers@onehippo.com>
>
>
> On Wed, Jun 24, 2009 at 3:32 PM, Marcel Reutegger <marcel.reutegger@gmx.net> wrote:
>>
>> Hi,
>>
>> On Wed, Jun 24, 2009 at 14:47, Ard Schrijvers<a.schrijvers@onehippo.com> wrote:
>> > Hello,
>> >
>> > I am confused regarding the order of search results (jr 1.5.2 core).
>> > First of all, I have configured <param name="respectDocumentOrder"
>> > value="false"/>.
>> >
>> > Thus, I would expect if I do someting like:
>> >
>> > //*[jcr:contains(.,'foo')]
>> >
>> > that my ordering would be by @jcr:score (lucene score) descending.
>>
>> no, it just means they the result nodes aren't necessarily in document order.
>
> what do you mean 'no'...I said I would expect it, and IMO it makes sense.

I'm sorry. I didn't mean to be rude.

> If in lucene you do not specify an order, you'll get back the highest scores first.

that true. however I'm not sure this is always desirable. what if I'm
just interested in the result set and not a particular order? then the
ordering is irrelevant and just costs more CPU cycles.

>>
>>
>> > If
>> > I print the scores, they seem instead to be random. So, this is not
>> > what i would expect.
>> >
>> > Secondly, when I do
>> >
>> > //*[jcr:contains(.,'foo')]  order by @jcr:score
>> >
>> > it makes sense to get results back in descending score order.
>>
>> why? that's contrary to how order by is defined.
>
> Yeah I understand, and I think the definition would have been way better if it would
have
> made an exception for the score: Who ever want to have the results with the lowest
> score first? Those queries must be really rare imo.

at some point in the discussion of the EG for JSR 170 the score was
actually named rank, which would have better matched your expectation.
assuming that a lower rank value is better. but then it was decided to
call it score where higher values mean more relevant.

>> > Unfortunately, they are in ascending order. This seems to be inline
>> > with the spec, 6.6.3.5 Ordering Specifier ('If neither ascending nor
>> > descending is specified after a property name (or jcr:score(...)
>> > function), the default is ascending.'), but, it does not make sense
>> > for the jcr:score. It is strange.
>>
>> no, it isn't :) it's just what you requested. order the result by
>> their score value in ascending order.
>
> If I do not specify an order, I get the lowest score first. I try to say that that is
really an odd default for score. I think this is quite straightforward (furthermore, though
I must check it,it is if i recall correctly, far more expensive with a lucene query to get
the lowest scores instead of the highest)
>
>>
>> > And beyond that, it will lead to
>> > really strange behavior in combination with setLimit i think: IIUC,
>> > order by @jcr:score is just the default lucene scoring.
>>
>> no, that's not correct. order by @jcr:score descending is the default
>> lucene scoring.
>
> Sry, I was not clear enough: what I am saying, is that if in SearchIndex you do:
>
> hits = searcher.search(query);
>
> then afaik, you'll get a Hits object in lucene sorted on score (descending, as this makes
> sense as a default). For some reason, the ordering when I get back the nodes is shuffled.
I
> don't understand why.

the technical reason is that jackrabbit creates a lucene Sort instance
with an empty fields arrays. IIUC that will result in nodes being
ordered by lucene document number.

> An order by score would make a perfect default if respectDocumentOrder is set to false.

see also my comment above. however we could default to 'order by
@jcr:score descending' if there is a fulltext search clause in the
query. WDYT?

>>
>>
>> > This means, if
>> > I sort on 'order by @jcr:score' then, the first hit (lowest score)
>> > depends on my setLimit. If I do setLimit(1), I get the first
>> > authorized lucene hit, which has the highest score possible.
>>
>> are you sure, this is the case? if yes, then this is a bug and should
>> be fixed. it should return the least relevant node.
>
> I was not sure, I only did it by reasoning, but now i see:
>
> if (NameConstants.JCR_SCORE.equals(orderProps[i])) {
>                 // order on jcr:score does not use the natural order as
>                 // implemented in lucene. score ascending in lucene means
that
>                 // higher scores are first. JCR specs that lower score
values
>                 // are first.
>                 sortFields.add(new SortField(null, SortField.SCORE, orderSpecs[i]));
>             }
>
>
> so my claim is incorrect
>
>>
>>
>> > If I do
>> > setLimit(1000000000) I first get the lowest score as spec defaults to
>> > inverting (ascending) the lucene order. So the combination jcr:score
>> > which defaults to ascending, is not usefull imo, and with a setLimit()
>> > returns quite unexpected results.
>> >
>> > I am not sure whether jsr-283 has some changes regarding this?
>>
>> no, this is still the same. higher score value means more relevant,
>> hence you have to sort descending to get the most relevant first.
>
> Perhaps a matter of taste...I would have always opted for the highest scores first as
a
> default, and thus make an explicit exception for score.
>
> Bottom line, I'll default to adding 'order by @jcr:score descending' when no order is
> specified :-))

or we change jackrabbit to do that if there is a fulltext clause (aka
jcr:contains()) in the query :)

regards
 marcel

> thx for your replies Marcel
>
> Ard
>
>>
>>
>> regards
>>  marcel
>

Mime
View raw message