lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: non-overlapping Span queries
Date Fri, 08 Dec 2006 15:46:04 GMT
On Friday 08 December 2006 00:09, Chris Hostetter wrote:
> 
> : > Brooklyn High which is in Brooklyn, NY
> :
> : That requires a minimum distance between the matches of the
> : subqueries, and that is not yet implemented.
> 
> I was about to suggest that adding that seems like it would be fairly
> easy, just add a new "int minDistance" to SpanNearQuery and then use it in
> NearSpansOrdered.docSpansOrdered to ensure that "end1 + minDistance <=
> start2" and in NearSpansUnordered.atMatch to test that "min.end() +
> minDistance <= max.start()" ... but then it orruced to me that the whole
> issue isn't thatsimple when you have a SpanNearQuery with more then two
> clauses.

It can be as simple as you suggest. Iirc I implemented the ordered case 
initially like you're suggesting with minDistance == 0 at Lucene issue 413 
(see also the comments at Lucene issue 569),
http://issues.apache.org/jira/browse/LUCENE-413 in
NearSpansOrdered.java there.

Ruslan, chances are that the 413 version works in the way you need,
but only for the ordered case. When you also need the non ordered case,
you can simply combine (Boolean Or / SpanOr) both possible orders.

> 
> I'm not even sure what a three clause SpanNearQuery with a miDistance of N
> would even mean .. is that the min distance between each clause, or
> between the outer most?
> 
> Paul: you under stand Span queries a lot better then i do: if you had a
> two clause SpanNear would my suggestion make sense?
> 
> we could allways add minDistance to SpanNearQuery, but make it private
> only only setable from a new constructor that explicitly only takes in two
> SpanQuery clauses (instead of an array).

Basically there are two independent ways in which spans can match:
overlapping / non overlapping, and ordered / non ordered.
In the current trunk the overlapping ordered and non ordered cases 
are implemented.
At Lucene issue 413 there is an implementation of the non overlapping
ordered case. That leaves the non overlapping non ordered case
to be implemented.

When there is no overlap, a minimum distance between the matching spans
makes sense. With overlap, one might try and define some negative distance
as the overlap, but I can't think of any real life cases for that.

At the moment I don't recall the details of the maximally allowed slop,
I have not yet looked at the code again.
Ideally the overlaps, distances and slops would be taken into
one minimum and one maximum to be passed to the constructor.

One simple way out of this would be to have non ordered span queries
with overlap, and to have ordered span queries without overlap.
This could be done by replacing the trunk NearSpansOrdered.java
by the one at Lucene issue 413.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message