lucene-java-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ronald Rudy <ronchal...@gmail.com>
Subject Re: Boolean expression for no terms OR matching a wildcard
Date Mon, 21 Jul 2008 19:05:39 GMT
Thanks Steve, this looks promising even if it doesn't perform the  
best.  I'll run some tests on what produces the best results.

-Ron


On Jul 21, 2008, at 3:00 PM, Steven A Rowe wrote:

> Hi Ronald,
>
> Caveat - I haven't tested this, but:
>
> With a RegexQuery <http://lucene.apache.org/java/2_3_2/api/org/apache/lucene/search/regex/RegexQuery.html

> >, I think you can do something like (using your example):
>
>   +abc*123 -{Regex}(?!abc.*123$)
>
> This query would include all documents that have terms that match  
> the wildcard "abc*123", and exclude all documents containing terms  
> that don't match regex "^abc.*123$".
>
> Note that the Lucene QueryParser doesn't handle regex queries (and  
> if it did, the syntax would probably be different than "{Regex}" -  
> this was intended solely for purposes of exposition).  As a result,  
> you would have to manually construct the RegexQuery and combine it  
> using BooleanQuery clauses with your wildcard query.
>
> The "(?!...)" syntax is a negative lookahead assertion - this is a  
> Java 1.4+ java.util.regex.Pattern feature.  Note that wildcard  
> expressions are easily programmatically converted to regular  
> expressions by substituting "*"->".*" and "?"->".", and then adding  
> the "$" anchor.  The "^" anchor is not required with RegexQuery's,  
> because when using the java.util.regex engine (the default engine),  
> j.u.r.Matcher.lookingAt() is used; from <http://java.sun.com/j2se/1.4.2/docs/api/java/util/regex/Matcher.html#lookingAt()

> >:
>
>   Attempts to match the input sequence, starting at the
>   beginning, against the pattern.
>
>   Like the matches method, this method always starts at the
>   beginning of the input sequence; unlike that method, it
>   does not require that the entire input sequence be matched.
>
> Caveat #2: RegexQuery's are relatively slow, since *all* index terms  
> have to be tested against the regular expression, so you may have to  
> use some other method if query response time turns out to be a  
> problem.
>
> Steve
>
> On 07/20/2008 at 8:29 AM, Ronald Rudy wrote:
>> A query solution is preferable.. but I can programmatically
>> filter my results after the fact, it just seems like something that
>> the Lucene team should consider adding.. I think it would only have
>> value for wildcard queries, but nonetheless it would have some value
>> I think..
>>
>> -Ron
>>
>> On Jul 18, 2008, at 6:24 PM, eks dev wrote:
>>
>>> Analyzer that detects your condition "ALL match something", if
>>> possible at all...
>>> e.g. "800123456 80034543534 80023423423" -> 800
>>>
>>> than you put it in ALL_MATCH field and match this condition against
>>> it... if this prefix needs to be variable, you could extract all
>>> matching prefixes to this fiield an make your query work like
>>> "ALL_MATCH:800" and care not for the rest :) than yo would not need
>>> field1 at all for these queries
>>>
>>> you were looking for something like this or you need "Query  
>>> solution"?
>>>
>>> ----- Original Message ----
>>>> From: Chris Hostetter <hossman_lucene@fucit.org>
>>>> To: java-user@lucene.apache.org
>>>> Sent: Saturday, 19 July, 2008 12:00:39 AM
>>>> Subject: Re: Boolean expression for no terms OR matching a wildcard
>>>>
>>>>> Maybe this is easier ... suppose what I'm indexing is a phone  
>>>>> number,
>>>>> and there are multiple phone numbers for what I'm indexing under  
>>>>> the
>>>>> same field (phone) and I want the wildcard query to match only
>>>>> records that have either no phone numbers at all OR where ALL  
>>>>> phone
>>>>> numbers are in a specific area code (e.g. 800* would match all  
>>>>> in the
>>>>> 800 area code).
>>>>
>>>> i can't think of anyway to accomplish the second part of your  
>>>> query.
>>>> specificly, given the following records...
>>>>
>>>> Doc1: field1:AAA, field1:Aaa, field1:Bb, field1:C, field2:X,  
>>>> field3:Y
>>>> Doc2: field1:AAA, field1:Aaa, field1:Aa, field2:Z
>>>>
>>>> ...i can't think of any type of query like field1:A* which would  
>>>> match
>>>> Doc2 but not Doc1 (because there are other field1 values that do
>>>> not start with 'A')
>>>>
>>>> -Hoss
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Mime
View raw message