Return-Path: Delivered-To: apmail-lucene-java-dev-archive@www.apache.org Received: (qmail 656 invoked from network); 28 Apr 2006 21:02:45 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 28 Apr 2006 21:02:45 -0000 Received: (qmail 42046 invoked by uid 500); 28 Apr 2006 21:02:42 -0000 Delivered-To: apmail-lucene-java-dev-archive@lucene.apache.org Received: (qmail 41986 invoked by uid 500); 28 Apr 2006 21:02:42 -0000 Mailing-List: contact java-dev-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-dev@lucene.apache.org Delivered-To: mailing list java-dev@lucene.apache.org Received: (qmail 41975 invoked by uid 99); 28 Apr 2006 21:02:42 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Apr 2006 14:02:42 -0700 X-ASF-Spam-Status: No, hits=0.6 required=10.0 tests=HTML_00_10,HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of captaincrunch2002@gmail.com designates 64.233.184.234 as permitted sender) Received: from [64.233.184.234] (HELO wproxy.gmail.com) (64.233.184.234) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 28 Apr 2006 14:02:39 -0700 Received: by wproxy.gmail.com with SMTP id i30so1873653wra for ; Fri, 28 Apr 2006 14:02:18 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:to:subject:mime-version:content-type; b=ggq2EkZ8KT/oIXkNcJ5szvDbtm5wpp3bVoti2hJBAFD+1dDA3pdwmJ/pbQcno9gJvH2HcKBar/ouV4vX0b9j3OR2JMGHdzx9OCgOYKEQWrowCxdb/UmCiMj7S3JgEIufiBk8VgdpsyNG2j0s0MGLN31Lmx45xGlBIVsAslhjIQU= Received: by 10.64.178.19 with SMTP id a19mr2596999qbf; Fri, 28 Apr 2006 14:02:10 -0700 (PDT) Received: by 10.65.163.20 with HTTP; Fri, 28 Apr 2006 14:02:10 -0700 (PDT) Message-ID: <11a4d6c20604281402q7335bed5k45d9b2812e704f1d@mail.gmail.com> Date: Fri, 28 Apr 2006 17:02:10 -0400 From: "Daniel Shane" To: java-dev@lucene.apache.org Subject: Tips on building a better BooleanQuery MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_6860_27974340.1146258130305" X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N ------=_Part_6860_27974340.1146258130305 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: quoted-printable Content-Disposition: inline Hi! I'm planning on contributing to Lucene by adding a new kind of query. I don= t know how to call it yet, but it would be a mix of BooleanQuery and ExactPhraseQuery. I would like to have a Query that is a BooleanQuery, but with a slight touc= h where it would boost results if it finds the query terms in a an exact phrase. For example, if I have terms A, B and C and I do a simple boolean search : = A B C, I would like to have a query that behaves a bit like if I rewrote this query as such : +A +B +C "A B" "B C" "A B C" This would boost results where the exact string "A B C" or any substring like "A B" or "B C" are found. Of course I could rewrite all the queries, but it takes way too long to search which this algorithm. I wanted to know if anyone has any ideas in what direction I should go, or if its easy or not to implement this idea by modifying or extending some already existing Query classes. I'm fairly new to Lucene although I know a bit about search engines, idf, etc... but I've tried to understand BooleanQuery and ExactPhraseQuery to se= e how I could modify them and I'm having a bit of a problem understanding it all on my own I guess. Any help or comments would be appreciated, and if it works well I do think it would be a good addition to the Lucene code base (I think this query should be used as a default in the QueryParser if it works ok instead of a simple BooleanQuery). Thanks in advance for your help, Daniel Shane ------=_Part_6860_27974340.1146258130305--