lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Chris Harris <rygu...@gmail.com>
Subject Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?
Date Thu, 17 May 2012 19:02:10 GMT
First impression is, that's a reasonably clever way to get the user
intent basically right without having to add a new SpanQuery. Have you
come up with any edge cases where it could do something unexpected?

So far I've thought of one, though you could argue it has more to do
with the "minimum/lazy/nonoverlapping match" nature of SpanQuery than
with your particular implementation of "and": Suppose there's a
document whose complete text is

    B A x A x x x x C

>From my hypothetical user's perspective, this should match the query
[A w/5 (B and C)], because the second "A" is within slop 5 of both B
and C. However, because SpanNear only does minimum-ish matches, this
document *won't* match the rewritten query SpanNear(A, spanNear(A, B,
5), spanNear(A, C, 5), 0); the only span generated for the SpanNear(A,
B, 5) subquery will be "B A", and the only span for SpanNear(A, C, 5)
will be "A x x x x C", and those two are not adjacent, so there's no
match for the outer SpanNear.

Also, while we're exploring your solution, do you also have a rule to
cover "not"?

On Thu, May 17, 2012 at 12:58 AM, Alan Woodward
<alan.woodward@romseysoftware.co.uk> wrote:
> I've just had to implement exactly this - the solution I came up with was to translate:
>
> A w/5 (B and C) -> SpanNear(A, spanNear(A, B, 5), spanNear(A, C, 5), 0)
> A w/5 (B or C) -> OR(spanNear(A, B, 5), spanNear(A, C, 5))
>
> More complex queries (such as (A AND B) w/5 (C AND D)) are dealt with by applying the
above rules recursively.  You do end up with some horribly overcomplicated queries, but it
seems to be performant enough.
>
>
>
> On 17 May 2012, at 04:38, Mike Sokolov wrote:
>
>> It sounds me as if there could be a market for a new kind of query that would implement:
>>
>> A w/5 (B and C)
>>
>> in the way that people understand it to mean - the same A near both B and C, not
just any A.
>>
>> Maybe it's too hard to implement using rewrites into existing SpanQueries?

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message