From java-user-return-36586-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Mon Oct 13 18:55:30 2008 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 54864 invoked from network); 13 Oct 2008 18:55:30 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 13 Oct 2008 18:55:30 -0000 Received: (qmail 1477 invoked by uid 500); 13 Oct 2008 18:55:23 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 1450 invoked by uid 500); 13 Oct 2008 18:55:23 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 1439 invoked by uid 99); 13 Oct 2008 18:55:23 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Oct 2008 11:55:23 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [208.69.42.181] (HELO radix.cryptio.net) (208.69.42.181) by apache.org (qpsmtpd/0.29) with ESMTP; Mon, 13 Oct 2008 18:54:18 +0000 Received: by radix.cryptio.net (Postfix, from userid 1007) id 10A0471C417; Mon, 13 Oct 2008 11:54:25 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by radix.cryptio.net (Postfix) with ESMTP id 0845D71C2C1 for ; Mon, 13 Oct 2008 11:54:25 -0700 (PDT) Date: Mon, 13 Oct 2008 11:54:25 -0700 (PDT) From: Chris Hostetter To: java-user@lucene.apache.org Subject: Re: Wildcard query ... In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Virus-Checked: Checked by ClamAV on apache.org BooleanQuery picks a Scorer based on the number of clauses and what their options are ... all of teh scorers it might pick from are smart enough to continuously reorder the clauses having them "skip ahead" to the next document they match, beyond whatever docIds it already knows can't match (based on the skipping of the other clauses. so it really doesn't amtter what order hte clauses appear in, it will optimize away as much work as it can. while *order* of clauses doesn't matter, *structure* of clauses can -- beyond just having subtle scoring differneces, these two queries... +(+A:X +B:Z) +(+C:Y +D:Z) +(+C:Y +B:Z) +(+A:X +D:Z) ...could have radically differnet performance characteristics, because the "skipping" happens at each level of the BooleanQuery hierarchy. if only one doc matches (+A:X +B:Z) then lots of skipping will happen in that first query with only a few matches actually being tested for the other clauses and the query as a whole -- but if lots of docs match (+C:Y +B:Z) and lots of *other* docs match (+A:X +D:Z) the subqueries won't ever skip very far. The other factor to keep in mind is the wildcard expansion ... Honda* will be expanded into all terms that start with Honda in that field before anything ever looks at what docs match any clauses of your query -- even if only one doc matches Type:0 ... which is why index partitioning can make sense in some situations like this. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org