lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Paul Elschot <paul.elsc...@xs4all.nl>
Subject Re: SpanNearQuery with minimum slop
Date Wed, 12 Apr 2006 19:40:43 GMT
On Wednesday 12 April 2006 19:21, Erik Hatcher wrote:
> 
> On Apr 11, 2006, at 11:56 AM, Doug Cutting wrote:
> > Erik Hatcher wrote:
> >> I have a potential need for a SpanNearQuery with an exact non- 
> >> zero  gap specified
> >
> > Ironically, you can now easily specify this with PhraseQuery, but  
> > not with SpanNearQuery.  You can construct a phrase query with  
> > explicit positions, e.g.:
> >
> > PhraseQuery pq = new PhraseQuery();
> > pq.add(new Term("f", "x"), 0);
> > pq.add(new Term("f", "y"), 2);
> >
> > This will match instances of "x" and "y" with exactly one word  
> > between them in field "f".
> >
> > Unfortunately one cannot (yet) construct a SpanNearQuery with  
> > explicit positions.  I think that would be useful, and not too hard  
> > to add. Internally all the computation is in terms of positions, so  
> > adding this is mostly a matter of exposing this in the public API.
> >
> >> Would this be as easy as modifying SpanNearQuery to have a  
> >> minimum  and maximum slop feature, and modifying  
> >> NearSpans.checkSlop() to add  a condition that the difference is  
> >> >= the minimum slop?  At first  glance it seems so, but I want to  
> >> be sure I'm not missing something.
> >
> > If you want to match something like "a ? b ? ? c", where "?"  
> > matches any word, then slop is not sufficient, you really need  
> > explicit positions, or else you'll match things like "a ? ? b ? c".
> 
> In my case, I was thinking of joining two zero-slopped SpanNearQuerys  
> within an outer SpanNearQuery with a slop of _exactly_ one, so it'd  
> be like this:  (a b) ? (c d)

This could still be done with a (variation of) PhraseQuery.
SpanQuerys are most useful when you need to nest them, for example
when (aa bb) should match as an alternative to (a b) in the example
above.
Without nesting, a "flat" phrase query on the terms can be used.
Nesting Scorers takes method calls and these bring some loss of
search performance.

When nesting and exact slop matching are both needed, a simplified
NearSpansOrdered from LUCENE-413 could be considered.

Regards,
Paul Elschot

---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message