lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: Qs re: document scoring and semantics
Date Tue, 19 Feb 2002 16:32:14 GMT
> From: Joshua O'Madadhain [mailto:jmadden@ics.uci.edu]
> 
> Is either of the expressions below the correct parenthesization of the
> expression above?  If not, what is?
> 
> score_d = sum_t(tf_q * (idf_t / norm_q) * tf_d * (idf_t / norm_d_t) *
> boost_t) * coord_q_d

That's correct.  The tf*idf weights are normalized for document length.  I
would parenthesize it:
  ((tf_q * idf_t) / norm_q) * ((tf_d * idf_t) / norm_d_t) * boost_t

> (2) I'm trying to make sure that I have a handle on the semantics of
> BooleanQuery [ ... ]
> 
> * BooleanQuery.add(query, false, false) is equivalent to Boolean OR.  
> * BooleanQuery.add(query, true, false) is equivalent to 
> Boolean AND.  All
> * BooleanQuery.add(query, false, true) is equivalent to 
> Boolean NAND.  

That's more-or-less correct.

To be precise: a binary boolean OR is implemented by:
   BooleanQuery query = new BooleanQuery();
   BooleanQuery.add(clause1, false, false);
   BooleanQuery.add(clause2, false, false);

A binary boolean AND is implemented by:
   BooleanQuery query = new BooleanQuery();
   BooleanQuery.add(clause1, true, false);
   BooleanQuery.add(clause2, true, false);

A binary boolean NAND is implemented by either:
   BooleanQuery query = new BooleanQuery();
   BooleanQuery.add(clause1, true, false);
   BooleanQuery.add(clause2, false, true);
or
   BooleanQuery query = new BooleanQuery();
   BooleanQuery.add(clause1, false, false);
   BooleanQuery.add(clause2, false, true);

> If these, and the semantics for "required" and "prohibited" (in
> BooleanQuery.add()), are accurate, then the semantics seem 
> rather odd to
> me, so I'm hoping that someone will tell me that I'm wrong.  :) In
> particular, it seems to me that if you create a BooleanQuery and add a
> single TermQuery tq to it with add(tq, false, false) then, 
> according to
> the semantics of "required" and "prohibited", *any* document 
> will match
> the query...which clearly doesn't make sense.

That is not the case.  Such a query will return all documents containing the
term.  This is equivalent to a unary OR.  A document which does not contain
some query term is never returned.

> (3) Somewhat unrelated question: what are the semantics and purpose of
> FilteredTermEnum.difference()?  (I see where and how it's used in the
> source but I don't understand the motivation.)

I did not implement this and cannot speak for it, but it appears to be
unused.

> (4) I'm still somewhat puzzled by MultiTermQuery.
> Could someone please explain what MultiTermQuery 
> is for,  
> how it should be used, etc.?

MultiTermQuery is a base class used to implement FuzzyQuery and
WildcardQuery.  These queries generate sets of terms by enumerating terms
from the index and filtering.  Ideally it would not be public, as it is not
intended for end users.

Doug

--
To unsubscribe, e-mail:   <mailto:lucene-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-dev-help@jakarta.apache.org>


Mime
View raw message