lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: lucene Scorers
Date Wed, 24 Nov 2004 00:45:16 GMT
Hi Ken,

I'm glad our replies were helpful.  It sounds like you looked at the
code in MaxDisjunctionQuery, so you probably noticed that it also
implements skipTo().  Your suggestion sounds like a good thing to do.  I
thought about that when writing MaxDisjunctionQuery, but didn't need the
generality, and it does make the code more complex.  I think Lucene
needs one of these mechanisms in it, at least to solve the problems
associated with the current default use of BooleanQuery for multiple
field expansions.  Your proposal would generalize this to solve
additional cases where different accrual operators are appropriate.

You could write and submit the generalization, although there are no
guarantees anybody would do anything with it.  I didn't get anywhere in
my attempt to submit MaxDisjunctionQuery.  I think there is also a
serious problem in scoring with the current score normalization (it does
not provide meaningfully comaparable scores across different searches,
which means that absolute score numbers like 0.8 have no intrinsic
meaning concerning how good a result is or is not).  When I finally get
back to tuning search in my app, that's the next one I'll try a
submission on.

Chuck

  > -----Original Message-----
  > From: Ken McCracken [mailto:ken.mccracken@gmail.com]
  > Sent: Tuesday, November 23, 2004 4:31 PM
  > To: Lucene Users List
  > Subject: Re: lucene Scorers
  > 
  > Hi,
  > 
  > Thanks the pointers in your replies.  Would it be possible to
include
  > some sort of accrual scorer interface somewhere in the Lucene Query
  > APIs?  This could be passed into a query similar to
  > MaxDisjunctionQuery; and combine the sum, max, tieBreaker, etc.,
  > according to the implementor's discretion, to compute the overall
  > score for a document.
  > 
  > -Ken
  > 
  > On Sat, 13 Nov 2004 12:07:05 +0100, Paul Elschot
  > <paul.elschot@xs4all.nl> wrote:
  > > On Friday 12 November 2004 22:56, Chuck Williams wrote:
  > >
  > >
  > > > I had a similar need and wrote MaxDisjunctionQuery and
  > > > MaxDisjunctionScorer.  Unfortunately these are not available as
a
  > patch
  > > > but I've included the original message below that has the code
  > (modulo
  > > > line breaks added by simple text email format).
  > > >
  > > > This code is functional -- I use it in my app.  It is optimized
for
  > its
  > > > stated use, which involves a small number of clauses.  You'd
want to
  > > > improve the incremental sorting (e.g., using the bucket
technique of
  > > > BooleanQuery) if you need it for large numbers of clauses.
  > >
  > > When you're interested, you can also have a look here for
  > > yet another DisjunctionScorer:
  > > http://issues.apache.org/bugzilla/show_bug.cgi?id=31785
  > >
  > > It has the advantage that it implements skipTo() so that it can
  > > be used as a subscorer of ConjunctionScorer, ie. it can be
  > > faster in situations like this:
  > >
  > > aa AND (bb OR cc)
  > >
  > > where bb and cc are treated by the DisjunctionScorer.
  > > When aa is a filter this can also be used to implement
  > > a filtering query.
  > >
  > >
  > >
  > >
  > > > Re. Paul's suggested steps below, I did not integrate this with
  > query
  > > > parser as I didn't need that functionality (since I'm generating
the
  > > > multi-field expansions for which max is a much better scoring
choice
  > > > than sum).
  > > >
  > > > Chuck
  > > >
  > > > Included message:
  > > >
  > > > -----Original Message-----
  > > > From: Chuck Williams [mailto:chuck@manawiz.com]
  > > > Sent: Monday, October 11, 2004 9:55 PM
  > > > To: lucene-dev@jakarta.apache.org
  > > > Subject: Contribution: better multi-field searching
  > > >
  > > > The files included below (MaxDisjunctionQuery.java and
  > > > MaxDisjunctionScorer.java) provide a new mechanism for searching
  > across
  > > > multiple fields.
  > >
  > > The maximum indeed works well, also when the fields differ a lot
  > length.
  > >
  > > Regards,
  > > Paul
  > >
  > >
  > >
  > >
  > >
---------------------------------------------------------------------
  > > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
  > > For additional commands, e-mail:
lucene-user-help@jakarta.apache.org
  > >
  > >
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
  > For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message