lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Chuck Williams" <ch...@manawiz.com>
Subject RE: How to proceed with Bug 31841 - MultiSearcher problems with Similarity.docFreq() ?
Date Sat, 15 Jan 2005 06:33:22 GMT
Doug Cutting wrote:
  > That sounds hairy.  Why not just add a single new method:
  > 
  >     boolean Query.isRewritten() { returns true; }
  > 
  > Then override this in TermQuery, PhraseQuery and SpanQuery to return
  > false, and in BooleanQuery to walk its clauses and return true iff
any
  > of them return true.  As an optimization, RemoteSearchable could
avoid
  > calling rewrite() when this is true.

I agree that simplification makes more sense.  This is the same method
as what I had called rewriteRequired (which to me is a more intuitive
name since isRewritten()==true doesn't mean the rewrite is done, rather
that it needs to be done).  The other parts of the mechanism I first
proposed do things that upon reflection we don't really need to do.

Chuck

  > -----Original Message-----
  > From: Doug Cutting [mailto:cutting@apache.org]
  > Sent: Friday, January 14, 2005 9:33 AM
  > To: Lucene Developers List
  > Subject: Re: How to proceed with Bug 31841 - MultiSearcher problems
with
  > Similarity.docFreq() ?
  > 
  > Chuck Williams wrote:
  > > Doug Cutting wrote:
  > >   > It would indeed be nice to be able to short-circuit rewriting
for
  > >   > queries where it is a no-op.  Do you have a proposal for how
this
  > > could
  > >   > be done?
  > >
  > > First, this gets into the other part of Bug 31841.  I don't
believe
  > > MultiSearcher.rewrite() is ever called.  Rewriting is done in the
  > > Weight's, which invoke the rewrite() method of the Searcher, which
is
  > > always the Seacher invoked by the MultiSearcher, not the
MultiSearcher
  > > itself.
  > 
  > This would be fixed by the proposal under consideration.  Weights
would
  > be constructed much earlier, using the top-level Searcher, so
rewrites
  > would use this too.
  > 
  > > In fact, MultiSearcher.rewrite() is broken.  It requires
  > > Query.combine() which is unsupported except for the derived
queries
  > > (i.e., those for which rewriting is not a no-op).  When I added
  > > topmostSearcher to get the Weight's to call the
  > MultiSearcher.docFreq(),
  > > that also caused them to call MultiSearcher.rewrite() which blows
up
  > on,
  > > for example, a simple TermQuery, because there is no
  > > TermQuery.combine().  That's why my patch contains a new default
  > > implementation for Query.combine() (which as noted in the bug
report
  > is
  > > probably not a good idea in general).
  > >
  > > So, I don't believe there is any valid rewrite() implementation
for
  > > MultiSearcher to start from, unless I've completely misunderstood
  > > something.
  > 
  > It looks like MultiSearcher.rewrite() was never implemented
correctly
  > since it was never called -- a latent bug.  It only needs to be
called
  > when queries are rewritten to something different:
  > 
  >    public Query rewrite(Query original) throws IOException {
  >      Query[] queries = new Query[searchables.length];
  >      boolean changed = false;
  >      for (int i = 0; i < searchables.length; i++) {
  >        Query rewritten = searchables[i].rewrite(original);
  >        changed = !rewritten.equals(original);
  >        queries[i] = rewritten;
  >      }
  >      if (changed) {
  >        return original.combine(queries);
  >      } else {
  >        return original;
  >      }
  >    }
  > 
  > Then we'll need an implementation of combine() for all query types.
The
  > implementation for BooleanQuery is fairly simple: combine() each of
the
  > corresponding clauses.  For TermQuery, PhraseQuery and SpanQuery
combine
  > should create a deduplicated OR.  Derived queries already have an
  > implementation.
  > 
  > > To address the question above, RemoteSearchable.rewrite() should
be a
  > > no-op, i.e. always return this.  For good error handling, it
should
  > > verify that the query does not require rewriting.  This requires
some
  > > mechanism to determine whether or not a query requires rewriting.
The
  > > challenge here is that some query types have a non-trivial
rewrite()
  > > method not because they require rewriting, but because they might
have
  > > subqueries that require rewriting (e.g., BooleanQuery).  Other
query
  > > types (e.g., MultiTermQuery) always require rewriting, while those
  > that
  > > implement Weight's never require it.  I think an upward
  > incompatibility
  > > is required in the API to address this.
  > >
  > > If that is acceptable, then this could work:
  > >   1.  Add a new interface called Rewritable that specifies a
boolean
  > > rewriteRequired() method.
  > >   2.  Have Query implement Rewritable but NOT provide an
  > implementation
  > > for rewriteRequired().  This will force all applications to add
  > support
  > > for this in order to upgrade.
  > >   2.  Change all the Weight's to call Query.maybeRewrite() instead
of
  > > Query.rewrite().
  > >   3.  Have Query.maybeRewrite() only call Query.rewrite() if
  > > Query.rewriteRequired() is true.
  > >   4.  Have RemoteSearchable.maybeRewrite() throw an Exception if
  > > Query.rewriteRequired() is true.
  > >   5.  Implement rewriteRequired() for all the built-in Query types
  > > (which is either true for derived queries, false for primitive
queries,
  > > or an or of rewriteRequired() for all the subqueries).
  > 
  > That sounds hairy.  Why not just add a single new method:
  > 
  >     boolean Query.isRewritten() { returns true; }
  > 
  > Then override this in TermQuery, PhraseQuery and SpanQuery to return
  > false, and in BooleanQuery to walk its clauses and return true iff
any
  > of them return true.  As an optimization, RemoteSearchable could
avoid
  > calling rewrite() when this is true.
  > 
  > Doug
  > 
  >
---------------------------------------------------------------------
  > To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
  > For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org


Mime
View raw message