jakarta-regexp-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Erik Hatcher <e...@ehatchersolutions.com>
Subject static prefix - REProgram.prefix package protected
Date Wed, 28 Dec 2005 15:02:28 GMT
Hi -

First a little background.  I'm implementing regular expression  
search capabilities within Lucene:

	http://svn.apache.org/viewcvs.cgi/lucene/java/trunk/contrib/regex/

I've made it pluggable such that any regular expression  
implementation can be used.  One detail that is very desirable within  
Lucene when doing multi-term queries such as wildcard, fuzzy, or  
regular expression matching is to narrow the number of terms  
enumerated.  Because terms (think of these as simply words) are in  
lexicographical order, picking the best starting point is crucial to  
the best performance.  In the most naive implementations of such  
enumeration, all terms in the index are enumerated which can be a  
real performance killer.

For a regular expression such as "foo.*" it is desirable to have the  
prefix "foo" to speed up term enumeration (yes, I know that "foo.*"  
matches any "foo.*", not necessarily at the beginning of the string -  
I'm accounting for this in other ways I can describe if desired).   
Jakarta Regexp provides this as a package protected internal variable  
REProgram.prefix.  I have written a little hack gateway to give this  
to me:

   package org.apache.regexp;

   /**
    * This class exists as a gateway to access useful Jakarta Regexp  
package protected data.
    */
   public class RegexpTunnel {
     public static char[] getPrefix(RE regexp) {
       REProgram program = regexp.getProgram();
       return program.prefix;
     }
   }


Would it be possible to add a public getter to return this prefix?

I realize that Jakarta Regexp is not that maintained, so I'm curious  
about other regex implementations and whether they can also provide  
this handy prefix, or if anyone has suggestions along these lines.  I  
did not see this capability within ORO or java.util.regex.

Thanks,
	Erik


---------------------------------------------------------------------
To unsubscribe, e-mail: regexp-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: regexp-dev-help@jakarta.apache.org


Mime
View raw message