lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: Boolean Scorer
Date Sun, 12 Dec 2004 03:01:21 GMT
I'd be surprised if the function call overhead was significant, but
nonetheless I can't argue with optimizing the sum case.  However, it
would seem this could be achieved without losing the generality by
having DisjunctionScorer.advanceAfterCurrent() call the initialization
and accumulation methods, while DisjunctionSumScorer overrides
advanceAfterCurrent() to implement its optimization.  This seems more
natural to me than having the general class be optimized for sum.

<soapbox>
I maintain the belief that max is *required* to implement reasonable
multi-field searching (1).  I can't imagine a case where the current
MultiFieldQueryParser actually does the right thing.  For users wishing
to take a typical query and have it search all fields of their
documents, they're going to get horrible results.  Maybe they won't
notice -- I care about the quality of results, did notice, and was
surprised I had to write my own class.  After all, Lucene is generally
an excellent search engine and it uses a multi-field-based document
model (which is a good thing).  I would think that good results for
multi-field searching out-of-the-box, and therefore built-in support for
max, would be viewed as required.

It doesn't really matter to me, because I have made it work right, and
will be able to make it work right again with the new scheme.  It's just
that I really like Lucene and am encouraging others to use it.  I love
the performance and am glad there is such emphasis placed in this area.
I'm also happy there is serious attention paid to ensuring the software
is easy to specialize or otherwise customize.  However, that same kind
of care does not seem to carry over to the quality of built-in relevance
ranking, nor to the quality and consistency of the scoring model in
general.  In these areas, I must say Lucene is weak.  Based on
experience in the commercial enterprise search engine market, this is
all too common, and the reason that most internal and site searches
produce such horrible results.  IT people focus on the performance,
scalability and architecture only while the users are screaming that the
results are no good.  I've seen this pattern many places.
</soapbox>

Chuck

(1)  Actually MaxDisjunctionScorer does something a little more refined
-- it starts with max and then adds in a specified, presumably small,
constant times the sum of the other terms.  The max part solves the
multi-field problem that is currently in Lucene; i.e., a result matching
multiple distinct query terms spread over multiple fields generally gets
a higher score than another result matching fewer query terms overall
but having the same number of matches in each field.  The contribution
of the small constant times the sum over the remaining terms allows a
result where a term matches in multiple fields to rise above other
results matching the same total term set in the same fields but without
the multiple matches. 

  > -----Original Message-----
  > From: Paul Elschot [mailto:paul.elschot@xs4all.nl]
  > Sent: Saturday, December 11, 2004 2:05 PM
  > To: lucene-dev@jakarta.apache.org
  > Subject: Re: Boolean Scorer
  > 
  > Chuck,
  > 
  > On Friday 10 December 2004 23:12, Chuck Williams wrote:
  > > Paul,
  > >
  > > Would there be a way to get the best of both worlds?  E.g., could
you
  > > factor the specializable score combination differently, so that
one
  > > method was called with each new score to generate a state entity,
  > while
  > > a final method computed the score from the state.  For both sum
and
  > max,
  > > the state entity could just be a float, not requiring an array.
The
  > > final operation for the sum with coord case would do the coord.  I
  > > haven't looked at the code carefully enough to see if this
actually
  > > works, but it seemed worth mentioning.
  > 
  > It's simple enough to do some abstract method call instead of
  > initializing
  > a sum or adding to it. The problem is that as long as such a call is
not
  > effectively inlined by the JVM, it will cause a performance hit for
the
  > sum
  > case.
  > 
  > The latest version of the advanceAfterCurrent method that computes
the
  > score is java protected. It can be overridden to make the best
  > of it in another world.
  > 
  > Regards,
  > Paul Elschot
  > 
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
  > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message