lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alan Woodward <alan.woodw...@romseysoftware.co.uk>
Subject Re: Approches/semantics for arbitrarily combining boolean and proximity search operators?
Date Thu, 17 May 2012 07:58:00 GMT
I've just had to implement exactly this - the solution I came up with was to translate:

A w/5 (B and C) -> SpanNear(A, spanNear(A, B, 5), spanNear(A, C, 5), 0)
A w/5 (B or C) -> OR(spanNear(A, B, 5), spanNear(A, C, 5))

More complex queries (such as (A AND B) w/5 (C AND D)) are dealt with by applying the above
rules recursively.  You do end up with some horribly overcomplicated queries, but it seems
to be performant enough.

 

On 17 May 2012, at 04:38, Mike Sokolov wrote:

> It sounds me as if there could be a market for a new kind of query that would implement:
> 
> A w/5 (B and C)
> 
> in the way that people understand it to mean - the same A near both B and C, not just
any A.
> 
> Maybe it's too hard to implement using rewrites into existing SpanQueries?
> 
> In term of the PositionIterator work  - instead of A being within 5 in a "minimum" distance
sense, what we want is that its "maximum" distance to all the terms in the other query (B
and C) is 5.  I'm not sure if any query in that branch covers this case though, either, but
if I recall, there was a way to implement extensions to it that were fairly natural.
> 
> -Mike
> 
> On 5/16/2012 7:15 PM, Trejkaz wrote:
>> On Thu, May 17, 2012 at 7:11 AM, Chris Harris<ryguasu@gmail.com>  wrote:
>>> but also crazier ones, perhaps like
>>> 
>>> agreement w/5 (medical and companion)
>>> (dog or dragon) w/5 (cat and cow)
>>> (daisy and (dog or dragon)) w/25 (cat not cow)
>> [skip]
>> 
>> Everything in your post matches our experience. We ended up writing
>> something which transforms the query as well but had to give up on
>> certain crazy things people tried, such as this form:
>> 
>>    (A and B) w/5 (C and D)
>> 
>> For this one:
>> 
>>   A w/5 (B and C)
>> 
>> We found the user expected the same A to be within 5 terms of both a B
>> and a C, and rewrote it to match that but also match more than they
>> asked for. So far, there have been no complaints about the overmatches
>> (it's documented.)
>> 
>> There is probably an extremely accurate way to rewrite it, but it
>> couldn't be figured out at the time. Maybe start with spans for A and
>> then remove spans not-near a B and spans not-near a C, which would
>> leave you with only spans near an A. The problem is that if you expand
>> the query to something like this, it gets quite a bit more complex, so
>> a user query which is already complex could turn into a really hard to
>> understand mess...
>> 
>> TX
>> 
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message