lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Walt Stoneburner" <walt.stonebur...@gmail.com>
Subject Re: ORs and Ranks
Date Wed, 07 Jan 2009 00:05:42 GMT
Erick,

  Thanks for taking a moment to address my question.  I suspect the
confusion expressed in the answer was from a slight transcription error that
added additional punctuation.

  In your reply, the query was expressed using fields (note the use of extra
use of colons that changes the query meaning entirely):
  MEDICAL:CAT^10 OR ANIMAL:CAT

  I'm actually using the defaults and no custom fields, which without the
colons makes MEDICAL and ANIMAL terms instead.  Here's the original query:
   ( +MEDICAL CAT^2 )  OR  ( +ANIMAL CAT^-2 )

  Luke, which I was using to analyze this has a problem with the numerical
value of negative two.  So, let's rewrite the query using a different,
parse-able, number, like zero:
   ( +MEDICAL CAT^2 )  OR  ( +ANIMAL CAT^0 )

  The question I'm trying to phrase is: Is there a way to make the rank of
SHOULD term conditional?

  In the example, I'm trying to express "If the term MEDICAL is found, the
term CAT ranks high; if the term ANIMAL is found, the term CAT ranks low."

  In your reply, you also stated, "Remember too that Lucene query logic
isn't strictly Boolean..."  This is my understanding as well, so I don't see
how this could work at all.

  The users I'm dealing with are looking at the query as one might an
expression in the C or Java language, where it either does the left half or
the right half.  My understanding is that the expression as a whole gets
reduced to something else entirely.

  And that's where things get weird.

  According to Luke, I get two SHOULD clauses, each with a MUST and a
SHOULD.   As I understood things, a SHOULD *term* merely affects the ranking
of the results, it doesn't affect what gets brought back.  So I'm trying to
understand what a SHOULD *clause* does in this case.  More importantly, what
does it logically mean to: "should have a must?"   That's like saying I have
an optional mandatory term.

  Or, is Lucene _really_ doing two separate sub-expressions?  Looking at the
data structures generated, it's flying counter to my understanding of what
has to be happening under the hood.

  Perhaps Lucene really can do this afterall?

  And, if not, is there a programatic way to do directly with the API?

  Is it even possible to express this construct as a single expression or
data structure for the API:
    1.   +( MEDICAL ANIMAL )    You must have either MEDICAL and/or ANIMAL.
    2.   If MEDICAL present, then CAT ranks high, else, if ANIMAL present,
then CAT ranks low, otherwise the presence of the term CAT has no influence
on rank.

Many thanks,
-wls

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message