commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Rami Ojares" <rami.oja...@elisa.fi>
Subject Re: [vfs][all][poll]regular expression library or jdk1.4 as minimum requirement
Date Thu, 24 Jun 2004 11:46:04 GMT
Daniel wrote:

> The Java regex syntax is almost a superset of Perl, which is why I don't
> see the impact of using a Perl engine for JDK 1.3 and java.util.regex
> for J2SE 1.4 as being major.  The expression Rami gave was straight
> Perl 5.005.  jakarta-oro's Perl5Compiler/Perl5Matcher implements
> zero-width look-ahead assertions from Perl 5.003 but does not implement
> the zero-width look-behind assertions from 5.005 and future versions (if
> you don't ask for it ...).  This can be added.  The other difference is
> that in Perl \Q and \E are not part of the regex syntax.  They are part
> of Perl string handling, so we didn't implement them in Perl5Compiler
> (instead quotmeta() is provided), but support them in the Perl5Util
> convenience class.  This can be moved into Perl5Compiler if desired.
> There has to be a user driver for these small things to happen.

Very true. It is also obvious that java has followed in the footsteps
of Perl that has much longer history with regexes. The reason they are not
compatible is the lack of standardisation on the perl side.
Since Java folks have always put much effort on internationalization
I think Java regexes have made extra effort with handling of Unicode.

If regexes would be standardized then Perl deserves to have the biggest word
in that committee.

However for that standard I feel that all the aspects of the language should be
encoded inside the language rather than outside (like embedded sql or quotemeta()
in regexes) Else the language will never be defined exactly but will have "loose boundaries".

> In general, most regular expressions you see in the wild can be
> simplified and don't require unusual constructs.  For example, why
> write "\\Q**\\E" when "\\*\\*" will do (you would usually want to use
> \Q and \E for longer sequences or for dynamically generated strings you
> want to escape; but quotemeta works equally well)?

I am using quoting with dynamic input so I need the feature.
Now I have been told that I need to support JAVA, PERL5 and POSIX syntaxes.
So in case of Java I have to use \\Q and \\E
In case of PERL5 I have to use quotemeta()
And in case of POSIX I have no clue !

> Why use a negative
> look-behind assertion in ((?<!^)|[^/]) when [^/] will suffice (the
> negative look-behind assertion is redundant because if there's a character
> present that's not a slash, then it's not the start of the input)?

Thanks for the tip! I am an occasional regex user :=)

>  Of
> course, you can't always simplify your expressions and I think Rami's point
> is that you shouldn't be bothered with the finer points and stuff should
> just work.

Thank you for understanding my intention so well !

> I think the answer is that as long as you stick to Perl5 syntax
> (which most people using java.util.regex are unknowingly doing), you'll
> rarely run into differences; but that oro doesn't implement most of the
> stuff added after Perl 5.003 for lack of demand (there's not that much stuff).
(And from above)
> There has to be a user driver for these small things to happen.

I think there is a user driver for the fact that users could read one
well written documentation about regexes and use them worry free.
Don't you think?

- rami

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message