jakarta-oro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Daniel F. Savarese" <...@savarese.org>
Subject Re: [PATCH] [:^foo:] and \p{IsDigit}...
Date Tue, 20 Feb 2001 03:48:25 GMT

>BUT Xerces-J (IBM regex4j) was implemented \p feature. So, I implemented it
>'\d' is included only '0'-'9' ASCII characters with Xerces-J. But \p is
>included all unicode
>numerical characters. '\d' in ORO isn't so.

I'll have to review what regex4j does (didn't even know the package
existed even though I use xerces).  I am guessing it implements it
because it processes a raw byte stream coming from a file.  I still
hold that \p has no meaning your input is always in Unicode.  I guess
this brings to the fore the need to write up what the principles of
being "compatible" with Perl mean for the org.apache.oro.text.regex
package.  There is a general idea of omitting those things that
are present in Perl regular expressions that don't make any sense
in the Java environment (e.g., we will never implement (?{ code })).

>I can remove it. I never use \p expression with ORO:)
>What do you think about this?

I think we should keep it out until there's a compelling reason to put
it in since I would posit no one will ever use it unless they have a
bunch of Perl regular expressions stored in a file somewhere that they
feed as input to a Java rewrite of a Perl program.  I'd rather focus on
adding things like zero-width lookbehind assertions that people have been
clamoring for.


View raw message