From java-user-return-23096-apmail-lucene-java-user-archive=lucene.apache.org@lucene.apache.org Fri Sep 01 10:55:01 2006 Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 25626 invoked from network); 1 Sep 2006 10:54:56 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur.apache.org with SMTP; 1 Sep 2006 10:54:56 -0000 Received: (qmail 30520 invoked by uid 500); 1 Sep 2006 10:54:53 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 29420 invoked by uid 500); 1 Sep 2006 10:54:50 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 29409 invoked by uid 99); 1 Sep 2006 10:54:50 -0000 Received: from asf.osuosl.org (HELO asf.osuosl.org) (140.211.166.49) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Sep 2006 03:54:50 -0700 X-ASF-Spam-Status: No, hits=0.5 required=10.0 tests=DNS_FROM_RFC_ABUSE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (asf.osuosl.org: domain of markrmiller@gmail.com designates 64.233.166.183 as permitted sender) Received: from [64.233.166.183] (HELO py-out-1112.google.com) (64.233.166.183) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 01 Sep 2006 03:54:49 -0700 Received: by py-out-1112.google.com with SMTP id c30so1170176pyc for ; Fri, 01 Sep 2006 03:54:29 -0700 (PDT) DomainKey-Signature: a=rsa-sha1; q=dns; c=nofws; s=beta; d=gmail.com; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:content-type:content-transfer-encoding; b=PJBjvAYuKKHRAWaeU2wgs2igsCeqssZijcIo2u5/iHGnNBwMNm0s0CIcPyQox5u1t5HhIt8gTFdTihQIVvOGiFzWA8BiJJDqgCd58Z0yY7guRdNQi9xnDWw0ST188VSf5Az5XFcMJWAXddmsd3hd/SnPIiUjAHb/zLbmfYIKD3w= Received: by 10.64.250.3 with SMTP id x3mr598040qbh; Fri, 01 Sep 2006 03:54:28 -0700 (PDT) Received: from ?192.168.1.102? ( [216.66.115.97]) by mx.gmail.com with ESMTP id f13sm119832qba.2006.09.01.03.54.27; Fri, 01 Sep 2006 03:54:28 -0700 (PDT) Message-ID: <44F81165.3030203@gmail.com> Date: Fri, 01 Sep 2006 06:54:29 -0400 From: Mark Miller User-Agent: Thunderbird 1.5.0.5 (Windows/20060719) MIME-Version: 1.0 To: java-user@lucene.apache.org Subject: Re: Proximity Query Parser References: <8834A84C87A2C148AD46921BB8BFC97C024497AD@S1SE1MAIL.emea1.ad.group> <44F7523F.5030106@gmail.com> <200609010918.18067.paul.elschot@xs4all.nl> In-Reply-To: <200609010918.18067.paul.elschot@xs4all.nl> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org X-Spam-Rating: minotaur.apache.org 1.6.2 0/1000/N Paul Elschot wrote: > Mark, > > On Thursday 31 August 2006 23:18, Mark Miller wrote: > >> I am not a huge fan of the queryparser's syntax so I have started an >> open source project to create a viable alternative. I could really use >> some helping testing it out. The more I can get it tested the better >> chance it has of serving the community. The parser is called Qsol. I am >> right up against its initial release. So far it: >> >> offers a simple clean syntax. >> allows arbitrary combinations/nesting of proximity and boolean queries. >> > > Could you say in a few words how the combination of proximity and boolean > is implemented in Qsol? > > I found this the most difficult thing to implement in surround. In surround, > every subquery that can be a proximity subquery has two (groups of) methods: > one for use as boolean and one for use as proximity. > I'd like to have a mechanism that allows mixing proximity and boolean queries > built into Lucene. > > Did you also implement parsed phrases with Lucene's PhraseQuery? > Surround does not have that. > > Regards, > Paul Elschot > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > > Hi Paul, I'm afraid my programming is prob quite a ways behind yours so I doubt anything I have done will be of any help to you. I also have to treat things differently depending on if I am in a proximity clause or boolean clause. A wildcard in a boolean is mapped to a wildcard query. A wildcard in a proximity is mapped to a regex span that has been modified to only deal with * and ?. When I run into a proximity, I collect a small tree of each clause and distribute them against each other...(old | map) ~3 big gets distributed to old ~3 big | map ~3 big. This distribution method appears to handle all boolean/proximity nesting/mixing cases for me, including: great ! "big old phrase search" ~5 (holy ~4 (big black bear)). The distribution maintains order of operations, but also obviously can create some pretty large queries. I did not use the phrase search because I do not like how the slop works (not in order, etc.) so both in and out of proximity uses a nearspan instead. For a multiphrase search I use an OrSpan on words in the same position. - Mark --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org