Return-Path: Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: (qmail 73383 invoked from network); 22 Feb 2007 00:41:28 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 22 Feb 2007 00:41:28 -0000 Received: (qmail 26005 invoked by uid 500); 22 Feb 2007 00:41:11 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 25963 invoked by uid 500); 22 Feb 2007 00:41:11 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 25931 invoked by uid 99); 22 Feb 2007 00:41:11 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Feb 2007 16:41:11 -0800 X-ASF-Spam-Status: No, hits=2.0 required=10.0 tests=HTML_MESSAGE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of erickerickson@gmail.com designates 64.233.182.188 as permitted sender) Received: from [64.233.182.188] (HELO nf-out-0910.google.com) (64.233.182.188) by apache.org (qpsmtpd/0.29) with ESMTP; Wed, 21 Feb 2007 16:40:59 -0800 Received: by nf-out-0910.google.com with SMTP id i2so353550nfe for ; Wed, 21 Feb 2007 16:40:38 -0800 (PST) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=kcfDcbNu2aQBtoDmKio0REetgHlQ/AbxmqYZKS1ccgdcwoGvb4Nt371s484h4wp6hlAKovmOhbhXyqShwgBSrdCNv3y46MbvFvUqORU2uhtzxMw1VoJjWkYSrnV8k5CACZ+ywOY3HhmoIOv7eoPNDVD//QS3femnDi0ZcOWLQqk= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:references; b=jS7uKBQtdRDel3lqTZ97brx5sV3yBwc7sACbzuD/hcz1TTAaT74yDJPtav5w8LKw1iJsFFd6PdZH7GoSZrsqkanBqLr7m28k/uCSno960X4MVGsjhxS9zVuCGcWhXsrrWiqbM+94JDm3CcVpOgQnxm8ny8B/8q3foebGSGJfABw= Received: by 10.82.178.11 with SMTP id a11mr8391buf.1172104838072; Wed, 21 Feb 2007 16:40:38 -0800 (PST) Received: by 10.82.162.20 with HTTP; Wed, 21 Feb 2007 16:40:38 -0800 (PST) Message-ID: <359a92830702211640n6eb2eb94n7ec8306cc867a237@mail.gmail.com> Date: Wed, 21 Feb 2007 19:40:38 -0500 From: "Erick Erickson" To: java-user@lucene.apache.org Subject: Re: Positions in SpanFirst In-Reply-To: <45DCDF3D.4070305@teamware.com> MIME-Version: 1.0 Content-Type: multipart/alternative; boundary="----=_Part_15307_27255739.1172104838029" References: <45DC28E0.4070604@teamware.com> <359a92830702210516h2d0faa55id9a71f5e8bd1312c@mail.gmail.com> <45DCABC9.7060303@teamware.com> <359a92830702211359r2705dcbegdbfa6411b3b4b170@mail.gmail.com> <45DCDF3D.4070305@teamware.com> X-Virus-Checked: Checked by ClamAV on apache.org ------=_Part_15307_27255739.1172104838029 Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Content-Disposition: inline I really think you need to stop obsessing on SpanFirst . I suspect that this is leading you down an unrewarding path. So I don't see why using a SpanNear that respects order and a large IncrementGap won't solve your problem...... Although it would return "odd" matches. Let's say you indexed "first second third" as one name and then searched on a SpanNear of second and third with a slop of 100. You'd get a match on a middle and last name rather than a first and last name..... But I wonder if this can be tolerated given all the new capabilities you'll doubtless be adding . Best Erick On 2/21/07, Antony Bowesman wrote: > > Hi Erick, > > > What this does is allow you to put gaps between successive sets of terms > > indexed in the same field. For instance... > > doc.add("field", "some stuff"); > > doc.add("field", "bunch hooey"); > > doc.add("field", "what is this"); > > writer.add(doc); > > > > In this case, there would be the following positions, assuming that the > > IncrementGap was 1000.... > > some 0 > > stuff 1 > > bunch 1002 > > hooey 1003 > > what 2004 > > is 2005 > > this 2006 > > So, if you can add 1000, shouldn't setting 0 each time cause it to start > at 0 > each time? The default Analyzer.getPositionIncrementGap always returns 0. > > >> That's a good point. The field is used to index mail recipients and > >> currently > >> has a "starts with" search (non Lucene implementation). Unless I can > set > >> the > >> position increment gap, it is only ever possible to search for the > first > >> indexed > >> recipient with proxity queries.\ > > > > > > This is confusing me. You can easily use proximity queries with the > above > > scenario. For instance, searching for "bunch hooey"~4 would generate a > hit. > > As would "bunch hooey"~10000. But "some this"~10 would not generate a > hit. > > Whether that does what you need is another question ... So it's time > to > > ask "what are you really trying to do?" In other words, what behavior > are > > you trying to mimic from the old code? It's not clear to me what the > > behavior you need is. It'd help if you gave a concrete example of the > raw > > data, and what you want returned... > > You example is good enough, just assume they are people's names :) I know > I had > a mail from Mrs Bunch Ogilvy, so I want to do a "starts with", i.e. > SpanFirst > for bunch, so I find all the first name bunches. > > > In your first example, using the above scheme, you'd get hits (using > > SpanNear rather than SpanFirst) if you searched on > > "first bit" in a SpanNear query with a slop of 2. You'd also get a hit > if > > you searched on > > "second part" in a SpanNear with a slop of 2. Does this mimic the > behavior > > you need? > > No, SpanNear is fine, but SpanFirst will not work as there always has to > be a > starting offset. I can't search "bunch hooey" as SpanFirst unless I know > that > it was indexed as the second 'group' and therefore set the starting span > position as 1002. > > Using Lucene has added a whole world of new search possibilities to the > product, > but when people have been using something a certain way for 15 years, it > can be > difficult to shift their expectations :) There's always someone who will > shout... > > Antony > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org > For additional commands, e-mail: java-user-help@lucene.apache.org > > ------=_Part_15307_27255739.1172104838029--