lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <>
Subject RE: DefaultSimilarity 2.0?
Date Mon, 20 Dec 2004 19:25:08 GMT
I agree it makes sense to isolate variables for analysis and comparison.
It also would seem that we should get as much benefit out of this
exercise as possible.  So, how about multi-field docs with multiple
query test sets?   One test set (or more) could have only single-field
queries.  A simple way to do this might be to have three fields on the
documents:  title, body, and all (= title+body).  We could have just one
set of queries that were run twice with a different parser (parsing into
"all", or parsing into "title" and "body").  That would provide another
interesting comparison -- a determination of whether or not
field-specific boosting is a benefit.


  > -----Original Message-----
  > From: Doug Cutting []
  > Sent: Monday, December 20, 2004 9:34 AM
  > To: Lucene Developers List
  > Subject: Re: DefaultSimilarity 2.0?
  > Chuck Williams wrote:
  > > Finally, I'd suggest picking content that has multiple fields and
  > allow
  > > the individual implementations to decide how to search these
fields --
  > > just title and body would be enough.  I would like to use my
  > > MaxDisjunctionQuery and see how it compares to other approaches
  > > the default MultiFieldQueryParser, assuming somebody uses that in
  > > test).
  > I think that would be a good contest too, but I'd rather first just
  > focus on the ranking of single-field queries.  There are a number of
  > issues that come up with multi-field queries that I'd rather
postpone in
  > order to reduce the number of variables we test at one time.
  > Doug
  > To unsubscribe, e-mail:
  > For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message