lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: comparing lucene scores across queries
Date Mon, 28 Mar 2011 08:11:42 GMT
Hi Patrick,

You can disable the coord factor in the constructor of BooleanQuery.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> Sent: Monday, March 28, 2011 10:09 AM
> To: java-user@lucene.apache.org
> Subject: Re: comparing lucene scores across queries
> 
> Hi, thanks for reply.
> 
> Yeah, I've read the Similarity class documentation several times, but I
need
> some tip.
> 
> My queries are BooleanQueries but they always have the same structure
> (the same structure of the docs, they are actually docs from collection):
3
> fields.
> 
> What if I simplify the similarity scores, by removing coord factor and
just
> leaving the cosine similarity which is comparable ?
> 
> I want to underline the fact that my boolean queries are just a
combination
> of "field:term" items, and I always have the same 3 fields with different
> terms obviously.
> 
> Thanks
> 
> 
> 
> 
> On 28 March 2011 10:03, Uwe Schindler <uwe@thetaphi.de> wrote:
> 
> > No, scores are in general not comparable between different queries.
> > The problem lies in many things:
> > - Each query has a norm factor that makes it more compareable if they
> > are sub clauses of a BooleanQuery. But you are right, this norm factor
> > should be the same.
> > - Some queries like FuzzyQuery rely on the terms in index and those
> > matches the query
> > - Inside Boolean queries, there is also a coord-factor involved
> >
> > If you are always using the same simple type of query (e.g. simple
> > TermQuery, only with different term) on the same index, you can
> > compare the scores. As soon as you are using complex queries (e.g
> > several terms compared in a BooleanQuery as QueryParser produces), the
> > scores are no longer comparable.
> >
> > You can read more on all factors that are included in scoring:
> >
> >
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/
> > Simila
> > rity.html
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > Sent: Monday, March 28, 2011 9:44 AM
> > > To: java-user@lucene.apache.org
> > > Subject: comparing lucene scores across queries
> > >
> > > Hi,
> > >
> > > sorry I've already asked few days ago, but I got no reply and I
> > > really
> > need
> > > some help on this..
> > >
> > > I'm running several queries against a doc collection. The queries
> > > are documents of the collection itself, I need to measure how
> > > similar is each document to the rest of the collection.
> > >
> > > Now, Lucene returns me a score per query, but I've been told such
> > > score
> > is
> > > not comparable across queries. Is this correct ?
> > >
> > > For example, arem't these scores comparable ?
> > > query1, score:8.324234
> > > query2, score:3.324238
> > >
> > > If so, why not ? Isn't the cosine similarity between the query
> > > vector and collection docs vectors ? I really need a comparable
measure.
> > >
> > > thanks
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message