lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Steven A Rowe" <sar...@syr.edu>
Subject RE: Boolean expression for no terms OR matching a wildcard
Date Mon, 21 Jul 2008 19:00:17 GMT
Hi Ronald,

Caveat - I haven't tested this, but:

With a RegexQuery <http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/regex/RegexQuery.html>,
I think you can do something like (using your example):

   +abc*123 -{Regex}(?!abc.*123$)

This query would include all documents that have terms that match the wildcard "abc*123",
and exclude all documents containing terms that don't match regex "^abc.*123$".

Note that the Lucene QueryParser doesn't handle regex queries (and if it did, the syntax would
probably be different than "{Regex}" - this was intended solely for purposes of exposition).
 As a result, you would have to manually construct the RegexQuery and combine it using BooleanQuery
clauses with your wildcard query.

The "(?!...)" syntax is a negative lookahead assertion - this is a Java 1.4+ java.util.regex.Pattern
feature.  Note that wildcard expressions are easily programmatically converted to regular
expressions by substituting "*"->".*" and "?"->".", and then adding the "$" anchor.
 The "^" anchor is not required with RegexQuery's, because when using the java.util.regex
engine (the default engine), j.u.r.Matcher.lookingAt() is used; from <http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html#lookingAt()>:

   Attempts to match the input sequence, starting at the
   beginning, against the pattern.

   Like the matches method, this method always starts at the
   beginning of the input sequence; unlike that method, it
   does not require that the entire input sequence be matched. 

Caveat #2: RegexQuery's are relatively slow, since *all* index terms have to be tested against
the regular expression, so you may have to use some other method if query response time turns
out to be a problem.

Steve

On 07/20/2008 at 8:29 AM, Ronald Rudy wrote:
> A query solution is preferable.. but I can programmatically
> filter my results after the fact, it just seems like something that
> the Lucene team should consider adding.. I think it would only have
> value for wildcard queries, but nonetheless it would have some value
> I think..
> 
> -Ron
> 
> On Jul 18, 2008, at 6:24 PM, eks dev wrote:
> 
> > Analyzer that detects your condition "ALL match something", if
> > possible at all...
> > e.g. "800123456 80034543534 80023423423" -> 800
> > 
> > than you put it in ALL_MATCH field and match this condition against
> > it... if this prefix needs to be variable, you could extract all
> > matching prefixes to this fiield an make your query work like
> > "ALL_MATCH:800" and care not for the rest :) than yo would not need
> > field1 at all for these queries
> > 
> > you were looking for something like this or you need "Query solution"?
> > 
> > ----- Original Message ----
> > > From: Chris Hostetter <hossman_lucene@fucit.org>
> > > To: java-user@lucene.apache.org
> > > Sent: Saturday, 19 July, 2008 12:00:39 AM
> > > Subject: Re: Boolean expression for no terms OR matching a wildcard
> > > 
> > > > Maybe this is easier ... suppose what I'm indexing is a phone number,
> > > > and there are multiple phone numbers for what I'm indexing under the
> > > > same field (phone) and I want the wildcard query to match only
> > > > records that have either no phone numbers at all OR where ALL phone
> > > > numbers are in a specific area code (e.g. 800* would match all in the
> > > > 800 area code).
> > > 
> > > i can't think of anyway to accomplish the second part of your query.
> > > specificly, given the following records...
> > > 
> > >  Doc1: field1:AAA, field1:Aaa, field1:Bb, field1:C, field2:X, field3:Y
> > >  Doc2: field1:AAA, field1:Aaa, field1:Aa, field2:Z
> > > 
> > > ...i can't think of any type of query like field1:A* which would match
> > > Doc2 but not Doc1 (because there are other field1 values that do
> > > not start with 'A')
> > >
> > > -Hoss



---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message