Return-Path: X-Original-To: apmail-lucene-java-user-archive@www.apache.org Delivered-To: apmail-lucene-java-user-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id C075C11A5E for ; Fri, 6 Jun 2014 14:12:58 +0000 (UTC) Received: (qmail 314 invoked by uid 500); 6 Jun 2014 14:12:57 -0000 Delivered-To: apmail-lucene-java-user-archive@lucene.apache.org Received: (qmail 257 invoked by uid 500); 6 Jun 2014 14:12:57 -0000 Mailing-List: contact java-user-help@lucene.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: java-user@lucene.apache.org Delivered-To: mailing list java-user@lucene.apache.org Received: (qmail 247 invoked by uid 99); 6 Jun 2014 14:12:56 -0000 Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2014 14:12:56 +0000 X-ASF-Spam-Status: No, hits=-0.5 required=5.0 tests=LONGWORDS,RCVD_IN_DNSWL_MED,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (athena.apache.org: local policy) Received: from [198.49.146.77] (HELO smtpksrv1.mitre.org) (198.49.146.77) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 06 Jun 2014 14:12:51 +0000 Received: from smtpksrv1.mitre.org (localhost.localdomain [127.0.0.1]) by localhost (Postfix) with SMTP id D395F1F0BEA; Fri, 6 Jun 2014 10:12:30 -0400 (EDT) Received: from IMCCAS03.MITRE.ORG (imccas03.mitre.org [129.83.29.80]) by smtpksrv1.mitre.org (Postfix) with ESMTP id B57A51F0BE8; Fri, 6 Jun 2014 10:12:30 -0400 (EDT) Received: from IMCMBX02.MITRE.ORG ([169.254.2.85]) by IMCCAS03.MITRE.ORG ([129.83.29.80]) with mapi id 14.03.0174.001; Fri, 6 Jun 2014 10:12:30 -0400 From: "Allison, Timothy B." To: "java-user@lucene.apache.org" , Darin McBeath Subject: RE: SpanQuery not working as expected Thread-Topic: SpanQuery not working as expected Thread-Index: AQHPgRRLhukFaSErPUGKSEChrTwJj5tkHr8A Date: Fri, 6 Jun 2014 14:12:29 +0000 Message-ID: <1D06A081892ADF4589BD83EE24B9DC3025960939@IMCMBX02.MITRE.ORG> References: <1402010208.4859.YahooMailNeo@web141002.mail.bf1.yahoo.com> In-Reply-To: <1402010208.4859.YahooMailNeo@web141002.mail.bf1.yahoo.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.140.19.249] Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-Virus-Checked: Checked by ClamAV on apache.org Hi Darin, Have you thought about using multivalued fields? If you set the positionIn= crementGap to something kind of big (well > 1, say :) ), and you know that = your data is always authorfirst, authorlast, you could just search for "da= rin fulford". The positionincrementgap will prevent matching on doc2 below. Doc1 Authorsfield: Darin fulford Doc2=20 Authorsfield: Matilda darin Fulford alexandria Don't get me wrong, I love the capabilities of SpanQuery, but will this sim= ple solution meet your needs? -----Original Message----- From: Darin McBeath [mailto:ddmcbeath@yahoo.com.INVALID]=20 Sent: Thursday, June 05, 2014 7:17 PM To: java-user@lucene.apache.org Subject: SpanQuery not working as expected I read through the=A0http://searchhub.org/2009/07/18/the-spanquery/=A0which= provided a good overview for how one can construct fairly complex span que= ries. =A0I was particularly interested in the ability to construct nested s= pan queries. =A0I'm trying to apply this concept to search a field that con= tains some structure (as below). =A0I have a couple of other fields that wi= ll have a bit more nesting, but this should give the general idea. =A0 authors =A0 author [one or more] =A0 =A0 first name =A0 =A0 last name Prior to indexing the content with Lucene, I added some 'markers' around th= e various bits I might want to search. =A0For example 'bauthor' implies beg= inning author, 'eauthor' implies ending author, and 'sauthor' implies a sep= arator between individual authors (that would be used as part of the exclud= e clause in a not span query). =A0I do similar things for 'first name' and = 'last name'. My constructed query (as interpreted by Lucene) is included below. =A0This = was extracted from the 'parsed string' returned from the query when I set d= ebug=3Dtrue. =A0Within a given 'authscope' field, I'm trying to find a situ= ation where the author first name is 'darin' and the last name is 'fulford'= within a given 'author'. =A0=A0 spanNot( =A0 =A0 spanNear( =A0 =A0 =A0 =A0 [authscope:bauthor,=A0 =A0 =A0 =A0 =A0 spanNear( =A0 =A0 =A0 =A0 =A0 =A0 [spanNot( =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spanNear( =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [authscope:bfname,=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 authscope:darin,=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 authscope:efname],=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 2147483647, true),=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 authscope:sfname, 0, 0),=A0 =A0=A0 =A0 =A0 =A0 =A0 =A0 spanNot( =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 spanNear( =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 [authscope:blname,=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 authscope:fulford,=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 authscope:elname],=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 2147483647, true),=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 authscope:slname, 0, 0)],=A0 =A0=A0 =A0 =A0 =A0 =A0 =A0 2147483647, false),=A0 =A0=A0 =A0 =A0 =A0 authscope:eauthor],=A0 =A0=A0 =A0 =A0 =A0 2147483647, true),=A0 =A0=A0 =A0 authscope:sauthor, 0, 0)", I have loaded the following =A02 documents into my index. [ =A0=A0{"id":"1",=A0"authscope":" bauthors=A0 bauthor blname=A0mcbeath elnam= e slname=A0 bfname=A0 darin efname sfname=A0 eauthor sauthor=A0 bauthor bln= ame=A0 fulford elname slname=A0 bfname=A0 darby efname sfname=A0 eauthor sa= uthor=A0 bauthor blname=A0 mcbeath elname slname=A0 bfname=A0 darby efname = sfname=A0 eauthor sauthor=A0 eauthors sauthors "}, =A0=A0{"id":"2",=A0"authscope":" bauthors=A0 bauthor blname=A0 mcbeath elna= me slname=A0 bfname=A0 darin efname sfname=A0 eauthor sauthor=A0 bauthor bl= name=A0 fulford elname slname=A0 bfname=A0 darin efname sfname=A0 eauthor s= author=A0 eauthors sauthors "} ] What I can't figure out is why the above query would match on both document= s. =A0It should only match the document with id:2. Any insights would be appreciated. =A0I'm using Lucene 4.7.2. Darin. --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org For additional commands, e-mail: java-user-help@lucene.apache.org