lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: custom scoring
Date Fri, 18 Jul 2008 14:54:13 GMT
Hi S├ębastien,

Have you looked into the DisjunctionMaxQuery <http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/DisjunctionMaxQuery.html>?
 From that page:

   A query that generates the union of documents produced by its
   subqueries, and that scores each document with the maximum score
   for that document as produced by any subquery, plus a tie
   breaking increment for any additional matching subqueries. This
   is useful when searching for a word in multiple fields with
   different boost factors (so that the fields cannot be combined
   equivalently into a single search field). We want the primary
   score to be the one associated with the highest boost, not the
   sum of the field scores (as BooleanQuery would give).

Here's the Solr wiki page for it: <http://wiki.apache.org/solr/DisMaxRequestHandler>.

I think you could just give the name and description fields the same boost, and you'd be pretty
close to getting what you want right out of the box with Solr.

Steve

On 07/17/2008 at 9:24 PM, S├ębastien Rainville wrote:
> Hi,
> 
> I would like to customize the lucene scoring to remove the
> effect of the "coord()" parameter in Similarity for the
> number of fields in which the whole query is found.
> 
> First, here's what I'm trying to achieve:
> 
> For example, if I have the query "New York" and that the
> documents have 2 fields: name and description. What I want
> is that it doesn't make a difference if "New York" is found
> in both the name and the description or just one of them.
> Similarity.coord() seems to be the parameter responsible
> for that...
> 
> My first attempt to solve that was to supply my own
> implementation of Similarity that is a subclass of
> DefaultSimilarity and simply override coord(). To my surprise,
> I found out that that parameter is not only used to take into
> consideration the number of fields in which the query was
> found but also the number of terms in the query found in a
> given field. So, the effect is that the number of fields in
> which the query is found is cancelled but it introduces the
> problem that now the query "New York" doesn't favor "New York"
> over "York" anymore, which is really problematic...
> 
> Then, I saw this:
> http://hudson.zones.apache.org/hudson/job/Lucene-trunk/javadoc//org/apache/lucene/search/package-summary.html#scoring
> 
> It's a very vague description. I then started to look into
> the lucene code and now I'm at the point where I don't really
> know where to start and if I'm looking at the right thing.
> 
> By the way, I'm using Solr... but the scoring is a lucene
> thing so I think I'm at the right place.
> 
> So, what I'm looking for is a hint on where and how I should start.
> 
> Thanks in advance,
> Sebastien

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message