tomcat-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Christopher Schultz <ch...@christopherschultz.net>
Subject Re: Tomcat 7 & regex
Date Mon, 27 Dec 2010 20:22:31 GMT
Mark,

On 12/24/2010 1:34 PM, Mark Thomas wrote:
> There are a number of configuration properties defined as "comma
> separated regular expressions". As someone pointed out at at ApacheCon
> that is a little odd. It stops "," being used in an expression and is
> inefficient.

A comma can still be used in a regular expression as long as the rules
about how we split the whole value are well-defined (like commas can be
escaped for in-regexp use).

> Having just been bitten by this while setting up the new Jira instance,
> I intend change all properties that take regex in Tomcat 7 to use a
> single regex. This will simplify the code, simplify configuration and
> make the regex processing faster.

So the plan would be to have users convert values like this:

127\.0\.0\.1, 10\.10\.10\.1, 192\.168\.1\.[0-9]+

to this:

(127\.0\.0\.1|10\.10\.10\.1|192\.168\.1\.[0-9]+)

I have some recommendations:

1. If it's not okay to break the "configuration interface", you should
change the name(s) of the attribute(s) so that old configurations are
easier to adapt to new environments. Something like "allowedIPs" might
become "allowedIPPattern". I'm not sure if incompatibility is something
we're concerned about, though there have been a number of pre-releases
on the 7.0 branch and this sounds like quite a breaking change.

2. Make it clear /which/ regular expressions will be supported. I hate
it when an API says "use a regular expression" and then they don't tell
you they're using Jakarta-ORO which doesn't (conveniently) support
Unicode and you have to spend a long time figuring out why your patterns
aren't working. Presumably, we'll be using the JDK's regular expression
classes: please just state that explicitly.

3. Please make it clear, on a per-attribute basis if appropriate,
whether the pattern will implicitly use start-of-input and end-of-input
markers on the ends. I've been bitten several times by the operational
differences between using Matcher.matches (which is implicitly "^...$")
and Matcher.find/Matcher.replaceAll. Presumable, we'll be using
Matcher.matches and therefore ^...$ is not necessary in any values being
provided by the user: please just state that explicitly.

4. Please ensure that the documentation clearly reminds readers (in each
attribute, rather than requiring the reader to go to a unified short
blurb about regular expressions) that a "." is "anything" and not just a
dot. Lots of (otherwise) smart people often write regular expressions
for IP addresses like this: 10.10.1.1.

Thanks!
-chris


Mime
View raw message