lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Doug Cutting <DCutt...@grandcentral.com>
Subject RE: Near without slop
Date Tue, 04 Dec 2001 17:00:08 GMT
> From: Paddy Clark [mailto:Paddy.Clark@grantadesign.com]
> >
> >My current "NEAR" solution is to modify the query parser to build a 
> >PhraseQuery from the terms surrounding NEAR and set the slop 
> >correctly.  This works for this kind of query:
> >
> >Bob NEAR Jim
> >
> >The problem comes when I try
> >
> >microsoft NEAR app*

One can do this currently by automatically generating a phrase query for
each of the possible phrases, e.g. "microsoft application", "microsoft
apps", but this is not very efficient, as it would process the term
'microsoft' many times.

I have proposed improving support for this sort of thing by adding the
following method to PhraseQuery:
  public void add(Term[] terms)
This would match any of the named terms at this position in the phrase.  One
could combine this with wildcard term enumeration to generate the phrase
query that you desire.  This would be more efficient than the above
approach.  For example, if you added {"a","b"} and {"x","y"} to a
PhraseQuery then it would match any of "a x", "a y", "b x" and "b y", and
much more efficiently than a query containing these four phrase queries.  It
would only traverse the TermPositions for each term once, while the
four-phrase version would traverse each twice.

I don't know how soon I will get to implementing this though...

Doug

--
To unsubscribe, e-mail:   <mailto:lucene-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:lucene-user-help@jakarta.apache.org>


Mime
View raw message