lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 31841] - [PATCH] MultiSearcher problems with Similarity.docFreq()
Date Wed, 27 Apr 2005 15:16:01 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=31841>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=31841





------- Additional Comments From chuck@manawiz.com  2005-04-27 17:15 -------
Wolf's revisions to my changes to Query.combine() look fine.  The single-query
optimization is good -- my oversight to have not included it originally.  I
don't believe either of the other two changes is necessary, but they are correct:
  1.  Using a flag instead of the labelled loop is a matter of style as Wolf
says, and it's a little less efficent (the biggest effect could be remedied by
one more if (splittable) to avoid unnecessarily copying the clauses of a
BooleanQuery where coord is not disabled).
  2.  Changing BooleanQuery equality to be independent of clause order is
semantically correct, although again it is a little less efficient.  It's only
purpose is to stop a false-negative in the new tests.

Regarding additional test cases, it would be helpful to add the cases I was
concerned about, which are situations where a query can rewrite into different
kinds of fundamental queries depending on the reader.  I believe the only case
where this occurs with the built-in queries is with MultiTermQuery's and
RangeQuery's (where the rewrite depends on how many query clauses are generated
by each reader), and we have covered those cases.  The coord testing in
Query.combine() is designed to handle the case where some query rewrites into a
different kind of BooleanQuery (e.g., an AND), in some readers and not others. 
Nothing tests this at present.  A single-term BooleanQuery OR could rewrite into
a BooleanQuery AND, but this would be independent of reader.

Many additional optimizations could be added.  It seems redundant to have
optimizations here and in the rewrite mechanism.  Since we are down to just
Query.combine(), only called from one place, I think a better fix is to change
MultiSearcher to pass the reader as well.  Then Query.combine() could construct
the straightforward BooleanQuery and rewrite it.  All the optimizations would
then go into a single place, the rewrite methods.  Wolf, what do you think of
that approach?


-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message