lucene-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject Re: Test code for regex queries
Date Sun, 04 Dec 2005 16:45:22 GMT
Following up on the (Span)RegexQuery topic, I've started working on  
moving this code to contrib/regex so that it can leverage various  
regex implementations.  I'm making a generic interface that currently  
(though subject to change) has these methods:

   void compile(String pattern);
   boolean match(String string);
   int prefixLength();

I'm going to initially create an implementation for both Jakarta  
Regexp and java.util.regex, and probably Jakarta ORO also.  I've been  
able to extract the prefix length using Jakarta Regexp, but I don't  
believe this is possible with java.util.regex.  I haven't looked into  
Jakarta ORO deep enough yet to see if it makes this available.

(Span)RegexQuery will have a setter for specifying which  
implementation to use, probably with the default for java.util.regex  
to allow running without any dependencies.

An interesting thing to note...

	Jakarta Regex: "a.c" matches "abcd"
	java.util.regex: "a.c" does not match "abcd" using Matcher.matches 
(), but it does match using Matcher.lookingAt()

In other words, if you want "a.*" to only match terms that begin with  
"a", the regex logically must be specified as "^a.*".  This is of no  
real concern to the regex query really, but the underlying matching  
implementation.  And for query parsing, it would likely be desirable  
to wrap all regex expressions with ^...$ (which is generally what  
users would mean when saying "a.*").

I'm also considering having the implementation independent interface  
specify a method to rotate an expression, though this is a more  
advanced feature that perhaps belongs at a different layer.

I'm open to suggestions on all of this, with my main goal to provide  
a general purpose regular expression query that can be as fast as  
possible by minimizing term enumeration.

	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: java-dev-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-dev-help@lucene.apache.org


Mime
View raw message