incubator-jena-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Paolo Castagna (JIRA)" <>
Subject [jira] [Commented] (JENA-242) LARQ scores not normalized
Date Fri, 04 May 2012 06:54:48 GMT


Paolo Castagna commented on JENA-242:

> Instead, I think you should use ORDER BY on the score, and then maybe LIMIT the results
to a subset. 

Hi Stephen, thanks for adding your comments. And, yes, this is what I was trying to argue
on jena-users ml. We both pointed at

However, the behaviour of LARQ has changed with the latest release and LARQ now does not report
normalised scores any more. This is better, however, it breaks compatibility with the past
(I don't think it's a problem, since probably only a few people are actually using LARQ in
any serious/production environment. Ready to be proven wrong on this, if it's not the case).
In particular, LARQ's documentation says you can limit the number of matches using:

 - ?lit pf:textMatch ( '+text' 100 ) . # Limit to at most 100 hits
 - ?lit pf:textMatch ( '+text' 0.5 ) . # Limit to Lucene scores of 0.5 and over.
 - ?lit pf:textMatch ( '+text' 0.5 100 ) . # Limit to scores of 0.5 and limit to 100 hits

I think we should just allow for (and this is my favourite choice):

 - ?lit pf:textMatch ( '+text' 100 ) . # Limit to at most 100 hits

If we are happy with this, I can close this issue as "Won't fix", explaining why. I can then
open another issue to remove the ability to limit results by score. 

Or, less work (I am happy with this option as well), we just change the documentation appropriately
specifying the score is not normalised and it varies query by query (and if future it might
change as/if we add new indexing systems, such as Solr, ElasticSearch, etc.).

Lao, using ontology constructs to improve search results is a very interesting topic, but
not quite relevant to this issue. Here we are not trying to develop a better scoring system
for LARQ. We are discussing whether we should return normalised or non normalised scores to
the users. Non normalised scores cause a small issue only when people try to limit the number
of matches via ?lit pf:textMatch ( '+text' 0.5 ).

Lao, Stephen (others?) what do you think?
> LARQ scores not normalized
> --------------------------
>                 Key: JENA-242
>                 URL:
>             Project: Apache Jena
>          Issue Type: Bug
>          Components: LARQ
>    Affects Versions: LARQ 1.0.0
>         Environment: Fuseki
>            Reporter: laotao
> In previous versions the LARQ score seemed to be normalized to range [0, 1]. In LARQ
1.0.0 some scores can be higher than 1. 
> Normalized scores are needed to filter sparql results (so that only items above certain
quality is shown).

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:!default.jspa
For more information on JIRA, see:


View raw message