lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Uwe Schindler" <...@thetaphi.de>
Subject RE: comparing lucene scores across queries
Date Mon, 28 Mar 2011 09:44:31 GMT
Hi,

As you seem to want to do very specific things, it might still be
interesting to provide a modified Similarity (by subclassing
DefaultSimilaity). You could then e.g. return also 1.0 to disable the
queryNorm() which may also be a problem (but it isn't for your queries).
Theoretically, you can change the Similarity to only have the cosine
similarity left over - if you only want to use that one.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de


> -----Original Message-----
> From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> Sent: Monday, March 28, 2011 11:39 AM
> To: java-user@lucene.apache.org
> Subject: Re: comparing lucene scores across queries
> 
> ok thanks, I will pass well I dunno how to verify it. Even if I try then I
get some
> scores, but I dunno if comparing them is reliable.
> 
> 
> On 28 March 2011 11:36, Uwe Schindler <uwe@thetaphi.de> wrote:
> 
> > Hi,
> >
> > You don't need to extend BooleanQuery, you can just pass "true" in its
> > ctor,
> > see: http://s.apache.org/QvK
> > Of course you can also subclass DefaultSimilarity and return 1 as
> > coord, but that is more work than passing true to a ctor.
> >
> > For your type of queries, disabling coord should be enough, but I am
> > not 100% sure! Why not simply try it out?
> >
> > Uwe
> >
> > -----
> > Uwe Schindler
> > H.-H.-Meier-Allee 63, D-28213 Bremen
> > http://www.thetaphi.de
> > eMail: uwe@thetaphi.de
> >
> >
> > > -----Original Message-----
> > > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > Sent: Monday, March 28, 2011 10:49 AM
> > > To: java-user@lucene.apache.org
> > > Subject: Re: comparing lucene scores across queries
> > >
> > > One more thing, instead of extending the BooleanQuery class to
> > > remove the coord factor, can I also extend the Similarity class to do
it ?
> > >
> > > Still the other question is open: just to be sure, if I disable the
> > > coord
> > factor I
> > > can finally compare my BooleanQuery results ?
> > >
> > > thanks
> > >
> > > >
> > > >
> > > >
> > > > On 28 March 2011 10:11, Uwe Schindler <uwe@thetaphi.de> wrote:
> > > >
> > > >> Hi Patrick,
> > > >>
> > > >> You can disable the coord factor in the constructor of
BooleanQuery.
> > > >>
> > > >> Uwe
> > > >>
> > > >> -----
> > > >> Uwe Schindler
> > > >> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > >> eMail: uwe@thetaphi.de
> > > >>
> > > >>
> > > >> > -----Original Message-----
> > > >> > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > >> > Sent: Monday, March 28, 2011 10:09 AM
> > > >> > To: java-user@lucene.apache.org
> > > >> > Subject: Re: comparing lucene scores across queries
> > > >> >
> > > >> > Hi, thanks for reply.
> > > >> >
> > > >> > Yeah, I've read the Similarity class documentation several times,
> > > >> > but I
> > > >> need
> > > >> > some tip.
> > > >> >
> > > >> > My queries are BooleanQueries but they always have the same
> > > >> > structure (the same structure of the docs, they are actually
docs
> > > >> > from
> > > >> collection):
> > > >> 3
> > > >> > fields.
> > > >> >
> > > >> > What if I simplify the similarity scores, by removing coord
factor
> > > >> > and
> > > >> just
> > > >> > leaving the cosine similarity which is comparable ?
> > > >> >
> > > >> > I want to underline the fact that my boolean queries are just
a
> > > >> combination
> > > >> > of "field:term" items, and I always have the same 3 fields with
> > > >> different
> > > >> > terms obviously.
> > > >> >
> > > >> > Thanks
> > > >> >
> > > >> >
> > > >> >
> > > >> >
> > > >> > On 28 March 2011 10:03, Uwe Schindler <uwe@thetaphi.de>
wrote:
> > > >> >
> > > >> > > No, scores are in general not comparable between different
> > queries.
> > > >> > > The problem lies in many things:
> > > >> > > - Each query has a norm factor that makes it more compareable
if
> > > >> > > they are sub clauses of a BooleanQuery. But you are right,
this
> > > >> > > norm factor should be the same.
> > > >> > > - Some queries like FuzzyQuery rely on the terms in index
and
> > > >> > > those matches the query
> > > >> > > - Inside Boolean queries, there is also a coord-factor involved
> > > >> > >
> > > >> > > If you are always using the same simple type of query (e.g.
> > > >> > > simple TermQuery, only with different term) on the same
index,
> > > >> > > you can compare the scores. As soon as you are using complex
> > > >> > > queries (e.g several terms compared in a BooleanQuery as
> > > >> > > QueryParser produces), the scores are no longer comparable.
> > > >> > >
> > > >> > > You can read more on all factors that are included in scoring:
> > > >> > >
> > > >> > >
> > > >> >
> > > http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/sear
> > > >> > ch/
> > > >> > > Simila
> > > >> > > rity.html
> > > >> > >
> > > >> > > -----
> > > >> > > Uwe Schindler
> > > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > >> > > eMail: uwe@thetaphi.de
> > > >> > >
> > > >> > >
> > > >> > > > -----Original Message-----
> > > >> > > > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > >> > > > Sent: Monday, March 28, 2011 9:44 AM
> > > >> > > > To: java-user@lucene.apache.org
> > > >> > > > Subject: comparing lucene scores across queries
> > > >> > > >
> > > >> > > > Hi,
> > > >> > > >
> > > >> > > > sorry I've already asked few days ago, but I got no
reply and
I
> > > >> > > > really
> > > >> > > need
> > > >> > > > some help on this..
> > > >> > > >
> > > >> > > > I'm running several queries against a doc collection.
The
> > queries
> > > >> > > > are documents of the collection itself, I need to measure
how
> > > >> > > > similar is each document to the rest of the collection.
> > > >> > > >
> > > >> > > > Now, Lucene returns me a score per query, but I've
been told
> > such
> > > >> > > > score
> > > >> > > is
> > > >> > > > not comparable across queries. Is this correct ?
> > > >> > > >
> > > >> > > > For example, arem't these scores comparable ?
> > > >> > > > query1, score:8.324234
> > > >> > > > query2, score:3.324238
> > > >> > > >
> > > >> > > > If so, why not ? Isn't the cosine similarity between
the
query
> > > >> > > > vector and collection docs vectors ? I really need
a
comparable
> > > >> measure.
> > > >> > > >
> > > >> > > > thanks
> > > >> > >
> > > >> > >
> > > >> > >
> > ---------------------------------------------------------------------
> > > >> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> > > For additional commands, e-mail: java-user-
> help@lucene.apache.org
> > > >> > >
> > > >> > >
> > > >>
> > > >>
> > > >>
---------------------------------------------------------------------
> > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > >>
> > > >>
> > > >
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message