Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 20464 invoked from network); 10 Jun 2006 13:41:15 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 10 Jun 2006 13:41:15 -0000 Received: (qmail 18660 invoked by uid 500); 10 Jun 2006 13:41:10 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 18351 invoked by uid 500); 10 Jun 2006 13:41:08 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 18334 invoked by uid 99); 10 Jun 2006 13:41:08 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Sat, 10 Jun 2006 06:41:08 -0700 X-ASF-Spam-Status: No, hits=1.4 required=10.0 tests=DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_WHOIS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: local policy) Received: from [206.190.38.243] (HELO web50310.mail.yahoo.com) (206.190.38.243) by apache.org (qpsmtpd/0.29) with SMTP; Sat, 10 Jun 2006 06:41:07 -0700 Received: (qmail 68377 invoked by uid 60001); 10 Jun 2006 13:40:45 -0000 DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=s1024; d=yahoo.com; h=Message-ID:Received:Date:From:Reply-To:Subject:To:In-Reply-To:MIME-Version:Content-Type; b=kwqYkt9US9NVismv7vsS3+Wbl/WVwVX/OAmSkD6nxcOghAA6lDxl4apw85y4Ug2MtERPQSSaHEDs6CQdROekhfbnlRG/ax0wqIFaKJmQ7Sk+H19MK6p0H+J5+y3cxfqlkugNjexTdn6y2emx1l6ccxC/MDxZ9DVFV9R9oU5FNn4= ; Message-ID: <20060610134045.68375.qmail@web50310.mail.yahoo.com> Received: from [72.229.167.99] by web50310.mail.yahoo.com via HTTP; Sat, 10 Jun 2006 06:40:45 PDT Date: Sat, 10 Jun 2006 06:40:45 -0700 (PDT) From: Otis Gospodnetic Reply-To: Otis Gospodnetic Subject: Re: Different scoring mechanism To: java-user@lucene.apache.org In-Reply-To: MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Chris, Somebody recently asked me about how Lucene processes queries. Other than working on required clauses in a BooleanQuery first, and skipping if there are no matching Docs for them, there are no other query optimization strategies/tricks, are there? Otis ----- Original Message ---- From: Chris Hostetter To: java-user@lucene.apache.org Sent: Friday, June 9, 2006 3:08:35 PM Subject: RE: Different scoring mechanism : For example: a query containing two terms: "fast", "car", having : document frequencies 300.000 and 20.000 in the index respectively. In a : worst case scenario this would require 320.000 document scores to be : calculated. I am not really sure how lucene optimizes its search, but I : guess it does that by first processing the documents having the highest : term frequencies (and thus highest combined score) with these query : terms, and pruning the search if the n hits have been found and it's : certain that no document can be found which will give a higher score. Nope. Lucene scores all "matching" documents in the index in increasing order of docId -- it can optimize the process using "skipTo" in Scorers when it knows that it's not possible for for a document to "match" the overall query, so it "skips ahead" to the first doc that can match. ie: if you have a boolean query like "+title:cat +title:dog body:snake" it knows that unless something matches title:cat and title:dog then there is not point in checking wether it matches body:snake -- let alone scoring hte doc at all. so BooleanScorer uses skipTo on the individual Scorers for title:cat and title:dog to keep skipping ahead untill it finds a doc matching both, then it checks if it matches body:snake, and if it does *then* it scores things. : If I would change the next function in my own scorer to process all : document ids, I am afraid I will wreck Lucene's optimization method (as : I am then not serving the documents in descending term frequency order). it would certianly eliminate lucenes ability to skip ahead (allthough not in the way you imagined) ... but based on the way you've described how you want scoring to work, it has to score every doc no matter what -- you've said that even if it doesn't contain the term at all it may get a score value which needs to be factored in to the overall score. -Hoss --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org