Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 17840 invoked from network); 28 Apr 2005 09:27:32 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Apr 2005 09:27:32 -0000 Received: (qmail 50282 invoked by uid 500); 28 Apr 2005 09:28:22 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 50166 invoked by uid 500); 28 Apr 2005 09:28:21 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 50114 invoked by uid 99); 28 Apr 2005 09:28:20 -0000 X-ASF-Spam-Status: No, hits=0.0 required=10.0 tests= X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: local policy) Received: from mrelay3.uni-hannover.de (HELO mrelay3.uni-hannover.de) (130.75.2.41) by apache.org (qpsmtpd/0.28) with ESMTP; Thu, 28 Apr 2005 02:28:20 -0700 Received: from server1.learninglab.uni-hannover.de (server1.l3s.uni-hannover.de [130.75.87.1]) by mrelay3.uni-hannover.de (8.12.10/8.12.10) with ESMTP id j3S9R99e027455 for ; Thu, 28 Apr 2005 11:27:09 +0200 (MEST) Received: from [130.75.87.153] (pc153.l3s.uni-hannover.de [130.75.87.153]) by server1.learninglab.uni-hannover.de (Postfix) with ESMTP id 62E6F1D640F7 for ; Thu, 28 Apr 2005 11:27:10 +0200 (CEST) Message-ID: <4270AC6E.1090008@l3s.de> Date: Thu, 28 Apr 2005 11:27:10 +0200 From: Wolf Siberski User-Agent: Mozilla Thunderbird 1.0RC1 (Windows/20041201) X-Accept-Language: en-us, en MIME-Version: 1.0 To: java-dev@lucene.apache.org Subject: Re: DO NOT REPLY [Bug 31841] - [PATCH] MultiSearcher problems with Similarity.docFreq() References: <20050427151601.3C5CE2DE@ajax.apache.org> In-Reply-To: <20050427151601.3C5CE2DE@ajax.apache.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-1.2.2 (mrelay3.uni-hannover.de [130.75.2.41]); Thu, 28 Apr 2005 11:27:09 +0200 (MEST) X-Scanned-By: MIMEDefang 2.42 X-Virus-Checked: Checked X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N > ------- Additional Comments From chuck@manawiz.com 2005-04-27 17:15 ------- > Wolf's revisions to my changes to Query.combine() look fine. The single-query > optimization is good -- my oversight to have not included it originally. I > don't believe either of the other two changes is necessary, but they are correct: > 1. Using a flag instead of the labelled loop is a matter of style as Wolf > says, and it's a little less efficent (the biggest effect could be remedied by > one more if (splittable) to avoid unnecessarily copying the clauses of a > BooleanQuery where coord is not disabled). Yep, the additional if... should be added. > 2. Changing BooleanQuery equality to be independent of clause order is > semantically correct, although again it is a little less efficient. It's only > purpose is to stop a false-negative in the new tests. Here I don't agree. The previous implementation was incorrect, and the new tests did discover that bug. I also considered to correct this by ensuring a defined order of clauses, or by replacing the vector with a set. That would have been a bit more performant, but would have needed much more effort and may have caused unwanted side effects. In general, IMHO query processing performance is nearly always dominated by index accesses, and in the few cases where query preparation takes a significant share, the whole processing will be fast enough anyway. So I don't see a need to squeeze out the last few processing cycles from query preparation. > Many additional optimizations could be added. It seems redundant to have > optimizations here and in the rewrite mechanism. Since we are down to just > Query.combine(), only called from one place, I think a better fix is to change > MultiSearcher to pass the reader as well. Then Query.combine() could construct > the straightforward BooleanQuery and rewrite it. All the optimizations would > then go into a single place, the rewrite methods. Wolf, what do you think of > that approach? Yes, there is a problem of code duplication. But I don't yet understand your proposal. Which reader could the MultiSearcher pass? We only have Searchables inside of MultiSearcher which don't (and probably shouldn't) expose their readers. Another way to approach the problem would be to split the rewriting process into two phases: in the first phase the query is rewritten into a combination of term queries, and in the second phase this combination is optimized. The second phase doesn't need the reader anymore. Then the MultiSearcher could delegate the first phase to its Searchables (as before), combine the resulting queries by just joining them, and then call the optimization method on the combined query. If there are no objections I could try if that works. --Wolf --------------------------------------------------------------------- To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org For additional commands, e-mail: java-dev-help@lucene.apache.org