lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Darin McBeath <ddmcbe...@yahoo.com.INVALID>
Subject SpanQuery not working as expected
Date Thu, 05 Jun 2014 23:16:48 GMT
I read through the http://searchhub.org/2009/07/18/the-spanquery/ which provided a good overview
for how one can construct fairly complex span queries.  I was particularly interested in
the ability to construct nested span queries.  I'm trying to apply this concept to search
a field that contains some structure (as below).  I have a couple of other fields that will
have a bit more nesting, but this should give the general idea.  

authors
  author [one or more]
    first name
    last name

Prior to indexing the content with Lucene, I added some 'markers' around the various bits
I might want to search.  For example 'bauthor' implies beginning author, 'eauthor' implies
ending author, and 'sauthor' implies a separator between individual authors (that would be
used as part of the exclude clause in a not span query).  I do similar things for 'first
name' and 'last name'.

My constructed query (as interpreted by Lucene) is included below.  This was extracted from
the 'parsed string' returned from the query when I set debug=true.  Within a given 'authscope'
field, I'm trying to find a situation where the author first name is 'darin' and the last
name is 'fulford' within a given 'author'.   

spanNot(
    spanNear(
        [authscope:bauthor, 
        spanNear(
            [spanNot(
                spanNear(
                    [authscope:bfname, 
                    authscope:darin, 
                    authscope:efname], 
                    2147483647, true), 
                authscope:sfname, 0, 0), 
             spanNot(
                spanNear(
                    [authscope:blname, 
                    authscope:fulford, 
                    authscope:elname], 
                    2147483647, true), 
                authscope:slname, 0, 0)], 
             2147483647, false), 
         authscope:eauthor], 
         2147483647, true), 
     authscope:sauthor, 0, 0)",

I have loaded the following  2 documents into my index.

[
  {"id":"1", "authscope":" bauthors  bauthor blname mcbeath elname slname  bfname 
darin efname sfname  eauthor sauthor  bauthor blname  fulford elname slname  bfname 
darby efname sfname  eauthor sauthor  bauthor blname  mcbeath elname slname  bfname 
darby efname sfname  eauthor sauthor  eauthors sauthors "},
  {"id":"2", "authscope":" bauthors  bauthor blname  mcbeath elname slname  bfname 
darin efname sfname  eauthor sauthor  bauthor blname  fulford elname slname  bfname 
darin efname sfname  eauthor sauthor  eauthors sauthors "}
]

What I can't figure out is why the above query would match on both documents.  It should
only match the document with id:2.


Any insights would be appreciated.  I'm using Lucene 4.7.2.

Darin.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message