lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: comparing lucene scores across queries
Date Mon, 28 Mar 2011 10:21:47 GMT
I see, well if you say the norm isn't a problem for my case, I will just
disable the coord factor by initializing BooleanQuery(true); and I should be
done.

If this is not correct, please anybody let me know.

On 28 March 2011 11:44, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi,
>
> As you seem to want to do very specific things, it might still be
> interesting to provide a modified Similarity (by subclassing
> DefaultSimilaity). You could then e.g. return also 1.0 to disable the
> queryNorm() which may also be a problem (but it isn't for your queries).
> Theoretically, you can change the Similarity to only have the cosine
> similarity left over - if you only want to use that one.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > Sent: Monday, March 28, 2011 11:39 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: comparing lucene scores across queries
> >
> > ok thanks, I will pass well I dunno how to verify it. Even if I try then
> I
> get some
> > scores, but I dunno if comparing them is reliable.
> >
> >
> > On 28 March 2011 11:36, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > > Hi,
> > >
> > > You don't need to extend BooleanQuery, you can just pass "true" in its
> > > ctor,
> > > see: http://s.apache.org/QvK
> > > Of course you can also subclass DefaultSimilarity and return 1 as
> > > coord, but that is more work than passing true to a ctor.
> > >
> > > For your type of queries, disabling coord should be enough, but I am
> > > not 100% sure! Why not simply try it out?
> > >
> > > Uwe
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > > Sent: Monday, March 28, 2011 10:49 AM
> > > > To: java-user@lucene.apache.org
> > > > Subject: Re: comparing lucene scores across queries
> > > >
> > > > One more thing, instead of extending the BooleanQuery class to
> > > > remove the coord factor, can I also extend the Similarity class to do
> it ?
> > > >
> > > > Still the other question is open: just to be sure, if I disable the
> > > > coord
> > > factor I
> > > > can finally compare my BooleanQuery results ?
> > > >
> > > > thanks
> > > >
> > > > >
> > > > >
> > > > >
> > > > > On 28 March 2011 10:11, Uwe Schindler <uwe@thetaphi.de> wrote:
> > > > >
> > > > >> Hi Patrick,
> > > > >>
> > > > >> You can disable the coord factor in the constructor of
> BooleanQuery.
> > > > >>
> > > > >> Uwe
> > > > >>
> > > > >> -----
> > > > >> Uwe Schindler
> > > > >> H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > > >> eMail: uwe@thetaphi.de
> > > > >>
> > > > >>
> > > > >> > -----Original Message-----
> > > > >> > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > > >> > Sent: Monday, March 28, 2011 10:09 AM
> > > > >> > To: java-user@lucene.apache.org
> > > > >> > Subject: Re: comparing lucene scores across queries
> > > > >> >
> > > > >> > Hi, thanks for reply.
> > > > >> >
> > > > >> > Yeah, I've read the Similarity class documentation several
> times,
> > > > >> > but I
> > > > >> need
> > > > >> > some tip.
> > > > >> >
> > > > >> > My queries are BooleanQueries but they always have the same
> > > > >> > structure (the same structure of the docs, they are actually
> docs
> > > > >> > from
> > > > >> collection):
> > > > >> 3
> > > > >> > fields.
> > > > >> >
> > > > >> > What if I simplify the similarity scores, by removing coord
> factor
> > > > >> > and
> > > > >> just
> > > > >> > leaving the cosine similarity which is comparable ?
> > > > >> >
> > > > >> > I want to underline the fact that my boolean queries are
just a
> > > > >> combination
> > > > >> > of "field:term" items, and I always have the same 3 fields
with
> > > > >> different
> > > > >> > terms obviously.
> > > > >> >
> > > > >> > Thanks
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> >
> > > > >> > On 28 March 2011 10:03, Uwe Schindler <uwe@thetaphi.de>
wrote:
> > > > >> >
> > > > >> > > No, scores are in general not comparable between different
> > > queries.
> > > > >> > > The problem lies in many things:
> > > > >> > > - Each query has a norm factor that makes it more compareable
> if
> > > > >> > > they are sub clauses of a BooleanQuery. But you are
right,
> this
> > > > >> > > norm factor should be the same.
> > > > >> > > - Some queries like FuzzyQuery rely on the terms in
index and
> > > > >> > > those matches the query
> > > > >> > > - Inside Boolean queries, there is also a coord-factor
> involved
> > > > >> > >
> > > > >> > > If you are always using the same simple type of query
(e.g.
> > > > >> > > simple TermQuery, only with different term) on the
same index,
> > > > >> > > you can compare the scores. As soon as you are using
complex
> > > > >> > > queries (e.g several terms compared in a BooleanQuery
as
> > > > >> > > QueryParser produces), the scores are no longer comparable.
> > > > >> > >
> > > > >> > > You can read more on all factors that are included
in scoring:
> > > > >> > >
> > > > >> > >
> > > > >> >
> > > > http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/sear
> > > > >> > ch/
> > > > >> > > Simila
> > > > >> > > rity.html
> > > > >> > >
> > > > >> > > -----
> > > > >> > > Uwe Schindler
> > > > >> > > H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
> > > > >> > > eMail: uwe@thetaphi.de
> > > > >> > >
> > > > >> > >
> > > > >> > > > -----Original Message-----
> > > > >> > > > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > > >> > > > Sent: Monday, March 28, 2011 9:44 AM
> > > > >> > > > To: java-user@lucene.apache.org
> > > > >> > > > Subject: comparing lucene scores across queries
> > > > >> > > >
> > > > >> > > > Hi,
> > > > >> > > >
> > > > >> > > > sorry I've already asked few days ago, but I got
no reply
> and
> I
> > > > >> > > > really
> > > > >> > > need
> > > > >> > > > some help on this..
> > > > >> > > >
> > > > >> > > > I'm running several queries against a doc collection.
The
> > > queries
> > > > >> > > > are documents of the collection itself, I need
to measure
> how
> > > > >> > > > similar is each document to the rest of the collection.
> > > > >> > > >
> > > > >> > > > Now, Lucene returns me a score per query, but
I've been told
> > > such
> > > > >> > > > score
> > > > >> > > is
> > > > >> > > > not comparable across queries. Is this correct
?
> > > > >> > > >
> > > > >> > > > For example, arem't these scores comparable ?
> > > > >> > > > query1, score:8.324234
> > > > >> > > > query2, score:3.324238
> > > > >> > > >
> > > > >> > > > If so, why not ? Isn't the cosine similarity between
the
> query
> > > > >> > > > vector and collection docs vectors ? I really
need a
> comparable
> > > > >> measure.
> > > > >> > > >
> > > > >> > > > thanks
> > > > >> > >
> > > > >> > >
> > > > >> > >
> > > ---------------------------------------------------------------------
> > > > >> > > To unsubscribe, e-mail:
> java-user-unsubscribe@lucene.apache.org
> > > > >> > > For additional commands, e-mail: java-user-
> > help@lucene.apache.org
> > > > >> > >
> > > > >> > >
> > > > >>
> > > > >>
> > > > >>
> ---------------------------------------------------------------------
> > > > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > > > >>
> > > > >>
> > > > >
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message