lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: comparing lucene scores across queries
Date Mon, 28 Mar 2011 09:36:14 GMT
Hi,

You don't need to extend BooleanQuery, you can just pass "true" in its ctor,
see: http://s.apache.org/QvK 
Of course you can also subclass DefaultSimilarity and return 1 as coord, but
that is more work than passing true to a ctor.

For your type of queries, disabling coord should be enough, but I am not
100% sure! Why not simply try it out?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> Sent: Monday, March 28, 2011 10:49 AM
> To: java-user@lucene.apache.org
> Subject: Re: comparing lucene scores across queries
> 
> One more thing, instead of extending the BooleanQuery class to remove the
> coord factor, can I also extend the Similarity class to do it ?
> 
> Still the other question is open: just to be sure, if I disable the coord
factor I
> can finally compare my BooleanQuery results ?
> 
> thanks
> 
> >
> >
> >
> > On 28 March 2011 10:11, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> >> Hi Patrick,
> >>
> >> You can disable the coord factor in the constructor of BooleanQuery.
> >>
> >> Uwe
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: uwe@thetaphi.de
> >>
> >>
> >> > -----Original Message-----
> >> > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> >> > Sent: Monday, March 28, 2011 10:09 AM
> >> > To: java-user@lucene.apache.org
> >> > Subject: Re: comparing lucene scores across queries
> >> >
> >> > Hi, thanks for reply.
> >> >
> >> > Yeah, I've read the Similarity class documentation several times,
> >> > but I
> >> need
> >> > some tip.
> >> >
> >> > My queries are BooleanQueries but they always have the same
> >> > structure (the same structure of the docs, they are actually docs
> >> > from
> >> collection):
> >> 3
> >> > fields.
> >> >
> >> > What if I simplify the similarity scores, by removing coord factor
> >> > and
> >> just
> >> > leaving the cosine similarity which is comparable ?
> >> >
> >> > I want to underline the fact that my boolean queries are just a
> >> combination
> >> > of "field:term" items, and I always have the same 3 fields with
> >> different
> >> > terms obviously.
> >> >
> >> > Thanks
> >> >
> >> >
> >> >
> >> >
> >> > On 28 March 2011 10:03, Uwe Schindler <uwe@thetaphi.de> wrote:
> >> >
> >> > > No, scores are in general not comparable between different queries.
> >> > > The problem lies in many things:
> >> > > - Each query has a norm factor that makes it more compareable if
> >> > > they are sub clauses of a BooleanQuery. But you are right, this
> >> > > norm factor should be the same.
> >> > > - Some queries like FuzzyQuery rely on the terms in index and
> >> > > those matches the query
> >> > > - Inside Boolean queries, there is also a coord-factor involved
> >> > >
> >> > > If you are always using the same simple type of query (e.g.
> >> > > simple TermQuery, only with different term) on the same index,
> >> > > you can compare the scores. As soon as you are using complex
> >> > > queries (e.g several terms compared in a BooleanQuery as
> >> > > QueryParser produces), the scores are no longer comparable.
> >> > >
> >> > > You can read more on all factors that are included in scoring:
> >> > >
> >> > >
> >> >
> http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/sear
> >> > ch/
> >> > > Simila
> >> > > rity.html
> >> > >
> >> > > -----
> >> > > Uwe Schindler
> >> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> >> > > eMail: uwe@thetaphi.de
> >> > >
> >> > >
> >> > > > -----Original Message-----
> >> > > > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> >> > > > Sent: Monday, March 28, 2011 9:44 AM
> >> > > > To: java-user@lucene.apache.org
> >> > > > Subject: comparing lucene scores across queries
> >> > > >
> >> > > > Hi,
> >> > > >
> >> > > > sorry I've already asked few days ago, but I got no reply and
I
> >> > > > really
> >> > > need
> >> > > > some help on this..
> >> > > >
> >> > > > I'm running several queries against a doc collection. The queries
> >> > > > are documents of the collection itself, I need to measure how
> >> > > > similar is each document to the rest of the collection.
> >> > > >
> >> > > > Now, Lucene returns me a score per query, but I've been told
such
> >> > > > score
> >> > > is
> >> > > > not comparable across queries. Is this correct ?
> >> > > >
> >> > > > For example, arem't these scores comparable ?
> >> > > > query1, score:8.324234
> >> > > > query2, score:3.324238
> >> > > >
> >> > > > If so, why not ? Isn't the cosine similarity between the query
> >> > > > vector and collection docs vectors ? I really need a comparable
> >> measure.
> >> > > >
> >> > > > thanks
> >> > >
> >> > >
> >> > >
---------------------------------------------------------------------
> >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> >> > >
> >> > >
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message