lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <alan.woodw...@romseysoftware.co.uk>
Subject Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?
Date Thu, 17 May 2012 19:53:50 GMT
You're right, those cases won't be covered, and probably can't be without some hacking at the
NearSpans* classes.  The other niggle I've found is that it doesn't play well with highlighting
- you get the entire span highlighted, rather than the individual terms within it.

For NOT WITHIN queries, I use the following:

X NOT WITHIN/5 Y -> SpanNotQuery(X, SpanNear(X, Y, 5))

which finds all instances of X, and then removes any that are also within 5 of Y.

On 17 May 2012, at 20:02, Chris Harris wrote:

> First impression is, that's a reasonably clever way to get the user
> intent basically right without having to add a new SpanQuery. Have you
> come up with any edge cases where it could do something unexpected?
> 
> So far I've thought of one, though you could argue it has more to do
> with the "minimum/lazy/nonoverlapping match" nature of SpanQuery than
> with your particular implementation of "and": Suppose there's a
> document whose complete text is
> 
>    B A x A x x x x C
> 
> From my hypothetical user's perspective, this should match the query
> [A w/5 (B and C)], because the second "A" is within slop 5 of both B
> and C. However, because SpanNear only does minimum-ish matches, this
> document *won't* match the rewritten query SpanNear(A, spanNear(A, B,
> 5), spanNear(A, C, 5), 0); the only span generated for the SpanNear(A,
> B, 5) subquery will be "B A", and the only span for SpanNear(A, C, 5)
> will be "A x x x x C", and those two are not adjacent, so there's no
> match for the outer SpanNear.
> 
> Also, while we're exploring your solution, do you also have a rule to
> cover "not"?
> 
> On Thu, May 17, 2012 at 12:58 AM, Alan Woodward
> <alan.woodward@romseysoftware.co.uk> wrote:
>> I've just had to implement exactly this - the solution I came up with was to translate:
>> 
>> A w/5 (B and C) -> SpanNear(A, spanNear(A, B, 5), spanNear(A, C, 5), 0)
>> A w/5 (B or C) -> OR(spanNear(A, B, 5), spanNear(A, C, 5))
>> 
>> More complex queries (such as (A AND B) w/5 (C AND D)) are dealt with by applying
the above rules recursively.  You do end up with some horribly overcomplicated queries, but
it seems to be performant enough.
>> 
>> 
>> 
>> On 17 May 2012, at 04:38, Mike Sokolov wrote:
>> 
>>> It sounds me as if there could be a market for a new kind of query that would
implement:
>>> 
>>> A w/5 (B and C)
>>> 
>>> in the way that people understand it to mean - the same A near both B and C,
not just any A.
>>> 
>>> Maybe it's too hard to implement using rewrites into existing SpanQueries?
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message