lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Clemens Marschner" <c...@lanlab.de>
Subject term Boosting + AND semantics
Date Mon, 12 Aug 2002 16:01:02 GMT
I want to boost terms found in header lines of HTML pages more than those found in other parts
of the text. (just an example which can be applied to different use cases).
Therefore I put text enclosed with <h1> tags in a special field (call it "hd") and the
other text in another field.

How can I rewrite the query so that I can reach AND semantics? That is,

- if terms of the query are found in the headers, the result docs are ranked higher
- if some terms of the query are found in the headers and some in the rest, it is ok. if they
are found in both, even better.
- if only one term of the whole query is not found in either of these fields, no results are
returned

think of the query
  my little query
AND semantics means something like
  +my +little +query
splitting this into two fields would be something like
  +hd:(+my +little +query)^2 +(+my +little +query)
but apparently, this will need the terms to occur in both fields.

The only idea I can think of is to provide every possible combination like
  (+hd:(my little query)^2 +(+my +little +query)) 
  (+hd:(my little +query)^2 +(+my +little query)) 
  (+hd:(my +little query)^2 +(+my little +query)) 
  ...
which is awkward. Any ideas? 

Clemens


Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message