lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Patrick Diviacco <patrick.divia...@gmail.com>
Subject Re: comparing lucene scores across queries
Date Mon, 28 Mar 2011 08:48:41 GMT
One more thing, instead of extending the BooleanQuery class to remove the
coord factor, can I also extend the Similarity class to do it ?

Still the other question is open: just to be sure, if I disable the coord
factor I can finally compare my BooleanQuery results ?

thanks

>
>
>
> On 28 March 2011 10:11, Uwe Schindler <uwe@thetaphi.de> wrote:
>
>> Hi Patrick,
>>
>> You can disable the coord factor in the constructor of BooleanQuery.
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: uwe@thetaphi.de
>>
>>
>> > -----Original Message-----
>> > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
>> > Sent: Monday, March 28, 2011 10:09 AM
>> > To: java-user@lucene.apache.org
>> > Subject: Re: comparing lucene scores across queries
>> >
>> > Hi, thanks for reply.
>> >
>> > Yeah, I've read the Similarity class documentation several times, but I
>> need
>> > some tip.
>> >
>> > My queries are BooleanQueries but they always have the same structure
>> > (the same structure of the docs, they are actually docs from
>> collection):
>> 3
>> > fields.
>> >
>> > What if I simplify the similarity scores, by removing coord factor and
>> just
>> > leaving the cosine similarity which is comparable ?
>> >
>> > I want to underline the fact that my boolean queries are just a
>> combination
>> > of "field:term" items, and I always have the same 3 fields with
>> different
>> > terms obviously.
>> >
>> > Thanks
>> >
>> >
>> >
>> >
>> > On 28 March 2011 10:03, Uwe Schindler <uwe@thetaphi.de> wrote:
>> >
>> > > No, scores are in general not comparable between different queries.
>> > > The problem lies in many things:
>> > > - Each query has a norm factor that makes it more compareable if they
>> > > are sub clauses of a BooleanQuery. But you are right, this norm factor
>> > > should be the same.
>> > > - Some queries like FuzzyQuery rely on the terms in index and those
>> > > matches the query
>> > > - Inside Boolean queries, there is also a coord-factor involved
>> > >
>> > > If you are always using the same simple type of query (e.g. simple
>> > > TermQuery, only with different term) on the same index, you can
>> > > compare the scores. As soon as you are using complex queries (e.g
>> > > several terms compared in a BooleanQuery as QueryParser produces), the
>> > > scores are no longer comparable.
>> > >
>> > > You can read more on all factors that are included in scoring:
>> > >
>> > >
>> > http://lucene.apache.org/java/3_0_3/api/core/org/apache/lucene/search/
>> > > Simila
>> > > rity.html
>> > >
>> > > -----
>> > > Uwe Schindler
>> > > H.-H.-Meier-Allee 63, D-28213 Bremen
>> > > http://www.thetaphi.de
>> > > eMail: uwe@thetaphi.de
>> > >
>> > >
>> > > > -----Original Message-----
>> > > > From: Patrick Diviacco [mailto:patrick.diviacco@gmail.com]
>> > > > Sent: Monday, March 28, 2011 9:44 AM
>> > > > To: java-user@lucene.apache.org
>> > > > Subject: comparing lucene scores across queries
>> > > >
>> > > > Hi,
>> > > >
>> > > > sorry I've already asked few days ago, but I got no reply and I
>> > > > really
>> > > need
>> > > > some help on this..
>> > > >
>> > > > I'm running several queries against a doc collection. The queries
>> > > > are documents of the collection itself, I need to measure how
>> > > > similar is each document to the rest of the collection.
>> > > >
>> > > > Now, Lucene returns me a score per query, but I've been told such
>> > > > score
>> > > is
>> > > > not comparable across queries. Is this correct ?
>> > > >
>> > > > For example, arem't these scores comparable ?
>> > > > query1, score:8.324234
>> > > > query2, score:3.324238
>> > > >
>> > > > If so, why not ? Isn't the cosine similarity between the query
>> > > > vector and collection docs vectors ? I really need a comparable
>> measure.
>> > > >
>> > > > thanks
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message