lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Julien Nioche" <Julien.Nio...@lingway.com>
Subject Re: Help with scoring, coordination factor?
Date Fri, 30 Apr 2004 09:53:09 GMT
[I move this discussion to the dev list]

> Then use this in place of BooleanQuery when you don't want coordination
> scoring.  I think that should do the trick.

In my case it works perfectly. As we generate multilingual and semantic
expansions of the original words of a query, the coordination factor was
giving lower score to words with a lot of semantic or morphologic variants.
The Query objects that extends BooleanQuery (let's call them WordQueries
just for clarity of the explanation) are combined into a BooleanQuery object
using the default coord factor.

What I'd like to do now is to be able to give thoseWordQueries an indication
of relevancy, for example if I have the following user query :  "generation
of semantic variants" our system will decide that 'generation' and its
variants (generations, generated, ...) is not particulary important compared
to the term 'variants' which is less important than 'semantic'. Let's give
the terms the following relevancy scores :
generation = 1
semantic = 3
variants = 2

This idea that a given term is carrying more or less information for a given
domain is behind the tf/idf weighting.

Let's take an example. We try this query on an index but no document is
found with all WordQueries. Instead we get a document containing one or more
expansion of the WordQuery 'generation' and one or more expansion of the
WordQuery 'variants' (i-e a document with the following text "... the
generated variant is ..."). On the other hand we find another document
matching the WordQuery 'semantic' and the WordQuery 'variants'.

In the first case the score would be (score WordQuery 'generation'  + score
WordQuery 'variants')*(2/3)
and in the second : (score WordQuery 'semantic' + score WordQuery
'variants')*(2/3)

whatever the scores may be for each WordQuery, what I'd like to have is :

score for the first document : (score WordQuery 'generation'  + score
WordQuery 'variants')*((1+2)/(1+2+3))
score for the second : (score WordQuery 'semantic' + score WordQuery 'varian
ts')*((3+2)/(1+2+3))

I created a new type of Query extending booleanQuery that combines
WordQueries, however the coordination information is currently a boolean
information indicating whether or not a given Query appears in a document.

Does anyone has any idea about how I can achieve this?

Thanks a lot


----- Original Message -----
From: "Doug Cutting" <cutting@apache.org>
To: "Lucene Users List" <lucene-user@jakarta.apache.org>
Sent: Thursday, April 29, 2004 9:37 PM
Subject: Re: Help with scoring, coordination factor?


> Matthew W. Bilotti wrote:
> > We suspect the coordination term in driving down
> > these documents' ranks and we would like to bring those documents back
up
> > to where they should be.
>
> That sounds right to me.
>
> > Is there a relatively easy way to implement what we want using Lucene?
> > Would it be better to try to supply a Similarity class with a
> > special-purpose coord method  [ ... ]
>
> I think this is a good approach.
>
> In 1.4, you can do something like:
>
> public class NoCoordBooleanQuery extends BooleanQuery {
>
>    private static final Similarity SIMILARITY = new DefaultSimilarity {
>      public float coord(int overlap, int max) {
>        return 1.0f;
>      }
>    };
>
>    public Similarity getSimilarity(Searcher searcher) {
>      return SIMILARITY;
>    }
>
> }
>
> Then use this in place of BooleanQuery when you don't want coordination
> scoring.  I think that should do the trick.
>
> Doug
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
> For additional commands, e-mail: lucene-user-help@jakarta.apache.org
>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message