lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: comparing lucene scores across queries
Date Mon, 28 Mar 2011 08:35:49 GMT
Cool, so just to be sure, if I disable the coord factor I can finally
compare my BooleanQuery results ?



On 28 March 2011 10:11, Uwe Schindler <uwe@thetaphi.de> wrote:

> Hi Patrick,
>
> You can disable the coord factor in the constructor of BooleanQuery.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
>
> > -----Original Message-----
> > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > Sent: Monday, March 28, 2011 10:09 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: comparing lucene scores across queries
> >
> > Hi, thanks for reply.
> >
> > Yeah, I've read the Similarity class documentation several times, but I
> need
> > some tip.
> >
> > My queries are BooleanQueries but they always have the same structure
> > (the same structure of the docs, they are actually docs from collection):
> 3
> > fields.
> >
> > What if I simplify the similarity scores, by removing coord factor and
> just
> > leaving the cosine similarity which is comparable ?
> >
> > I want to underline the fact that my boolean queries are just a
> combination
> > of "field:term" items, and I always have the same 3 fields with different
> > terms obviously.
> >
> > Thanks
> >
> >
> >
> >
> > On 28 March 2011 10:03, Uwe Schindler <uwe@thetaphi.de> wrote:
> >
> > > No, scores are in general not comparable between different queries.
> > > The problem lies in many things:
> > > - Each query has a norm factor that makes it more compareable if they
> > > are sub clauses of a BooleanQuery. But you are right, this norm factor
> > > should be the same.
> > > - Some queries like FuzzyQuery rely on the terms in index and those
> > > matches the query
> > > - Inside Boolean queries, there is also a coord-factor involved
> > >
> > > If you are always using the same simple type of query (e.g. simple
> > > TermQuery, only with different term) on the same index, you can
> > > compare the scores. As soon as you are using complex queries (e.g
> > > several terms compared in a BooleanQuery as QueryParser produces), the
> > > scores are no longer comparable.
> > >
> > > You can read more on all factors that are included in scoring:
> > >
> > >
> > http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/
> > > Simila
> > > rity.html
> > >
> > > -----
> > > Uwe Schindler
> > > H.-H.-Meier-Allee 63, D-28213 Bremen
> > > http://www.thetaphi.de
> > > eMail: uwe@thetaphi.de
> > >
> > >
> > > > -----Original Message-----
> > > > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
> > > > Sent: Monday, March 28, 2011 9:44 AM
> > > > To: java-user@lucene.apache.org
> > > > Subject: comparing lucene scores across queries
> > > >
> > > > Hi,
> > > >
> > > > sorry I've already asked few days ago, but I got no reply and I
> > > > really
> > > need
> > > > some help on this..
> > > >
> > > > I'm running several queries against a doc collection. The queries
> > > > are documents of the collection itself, I need to measure how
> > > > similar is each document to the rest of the collection.
> > > >
> > > > Now, Lucene returns me a score per query, but I've been told such
> > > > score
> > > is
> > > > not comparable across queries. Is this correct ?
> > > >
> > > > For example, arem't these scores comparable ?
> > > > query1, score:8.324234
> > > > query2, score:3.324238
> > > >
> > > > If so, why not ? Isn't the cosine similarity between the query
> > > > vector and collection docs vectors ? I really need a comparable
> measure.
> > > >
> > > > thanks
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message