lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: structuring a multifield query
Date Sat, 04 Dec 2004 21:24:34 GMT
There are at least 3 different ways to achieve this.  You need to expand
the query such that the terms which spread a single word across fields
are combined differently from the terms which combine the words.  You
want to boost the score of the result based on different words matching,
but not count the scores of a single word matching in multiple fields as
highly.

I created a MaxDisjunctionQuery to use as the expansion of a single word
across multiple fields precisely to solve this problem.  You combine the
MaxDisjunctionQuery's within an encompassing BooleanQuery with one
field-expanding MaxDisjunctionQuery for each separate word.

Paul Elschot created a DisjunctionQuery that can be used similarly.

Doug Cutting proposed a solution using just BooleanQuery's with
different Similarity's (and some method specialization as I recall).

If you search the archives for MaxDisjunctionQuery and DisjunctionQuery
you'll find the full discussion and code.

I think that MultiFieldQueryParser should use one of these techniques,
but unfortunately it doesn't, so for most purposes I can think of it
generates poor relevance ranking.  I don't use it because of this.

Chuck

  > -----Original Message-----
  > From: Chris Fraschetti [mailto:fraschetti@gmail.com]
  > Sent: Saturday, December 04, 2004 10:25 AM
  > To: Lucene Users List
  > Subject: structuring a multifield query
  > 
  > Hey there folks.. I'm having a bit of trouble figuring out how to
use
  > and AND term among all terms, but spread throughout the fields.
  > 
  > I currently use the static MultiFieldQueryParser method to parse my
  > query..
  > 
  > basically I have several fields.. title, contents, and a few
others...
  > and I would like a query such as    "super computer" to do the
  > standard distribution amongst the fields, but I believe currently
all
  > these terms are done in an OR fashion.. but I would prefer it so
that
  > they much all match somewhere.. i.e.  "super" could match in the
title
  > and "computer" in the content, but they must match somewhere....  I
  > obviously was able to do this when I queried a single field, is
there
  > a schema to apply this to multiple fields?
  > 
  > --
  > ___________________________________________________
  > Chris Fraschetti, Student
  > http://meteora.cs.usfca.edu
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
  > For additional commands, e-mail: lucene-user-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-user-help@jakarta.apache.org


Mime
View raw message