lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: regex-based query contribution
Date Thu, 13 Oct 2005 10:54:21 GMT

On Oct 13, 2005, at 3:15 AM, Paul Elschot wrote:
>> The main negative to this query, just like with WildcardQuery and
>> FuzzyQuery, is the possible performance issue.  However, just like
>> WildcardQuery, this really depends on how clever the indexing side of
>> things is and matching that cleverness with an appropriate regex.  In
>> my actual use of these queries involves doing overlapped rotated term
>> indexing and also rotating the query term to have the best possible
>> prefix for term enumeration.  Naive use of this query using ".*foo"
>> of course will have the same impact as WildcardQuery using *foo - and
>> perhaps slightly slower with regex matching involved.
>>
>> Overall, I think it is a good addition and will allow users to be
>> more expressive than the lower-level MultiPhraseQuery (aka
>> PhrasePrefixQuery).
>>
>> Thoughts?
>>
>
> In the surround language, this was done by splitting the query term
> in a fixed prefix and a remainder starting with a truncation  
> character.
> For this remainder a regular expression is built and used.
> The prefix is used to limit the number of terms fed to the regular  
> expression
> matcher. The code is in SrndTruncQuery.java here:
> http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/ 
> surround/src/java/org/apache/lucene/queryParser/surround/query/

Likewise with my PatternQuery - it limits the term enumeration just  
as WildcardQuery does, to the fixed prefix.

> So, with an addition to the javadocs that the length of the prefix is
> important for performance, I think a regular expression based query  
> term
> would be very useful, especially when combined an analyzer that does
> appropriate term rotation.

Right - I just mentioned the caveat to have the bases covered.  It  
would be possible to do a PatternQuery("*") that would enumerate  
every term.  At this point - anyone using such a query would have to  
do it by the API, just as they would the SpanQuery family - so it  
would be for power users that hopefully would understand how these  
queries work.

And with term rotation, as  you say, things get much much better!

     Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message