ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stephane Bailliez <sbaill...@imediation.com>
Subject Ant Regexp wrappers [Re: multiline mode and platform issues]
Date Tue, 29 Jan 2002 09:08:45 GMT
Regexp tests don't pass on Windows in multiline matches because of line
separator issues.
 
On Unix:
"endofline\n"  is matched by "endofline$"
 
On Windows:
"endofline\r\n" is matched by "endofline\r$"
 
 
I asked the question in oro mailing list and Daniel Savarese kindly pointed
me to a related post:
http://www.mail-archive.com/oro-dev%40jakarta.apache.org/msg00172.html
<http://www.mail-archive.com/oro-dev%40jakarta.apache.org/msg00172.html> 
 
As you will notice, there is a JDK 1.4 issue here that as a default does not
behave the same way and eats every line terminator (carriage return,
paragraph, next line,...)it knows about: '\n', "\r\n", '\r', '\u0085',
'\u2028' '\u2029'.
 
To be consistent with the default Perl implementation (it makes sense,
because indeed we might perfectly deal with documents with '\r\n' in Unix
and vice versa), we must use the option UNIX_LINES.
 
I'm not sure what would make sense in our case. Results might be
inconsistents depending on the platform and regexp engine used.
 
- do we enable UNIX_LINES as a default for JDK1.4 wrapper ? Then what about
platform consistencies for build ? That means people must write regexp like
"endofline[\\r]$" (what's the line terminator on Mac ?). That would be the
easiest way and would make sense if we adhere to the Perl way for now so
that all engines are consistent in their behavior.
 
- do we do the above and use a system property to switch from SYSTEM_LINES
to UNIX_LINES and do a filtering in (replace all system separator with \n)
and out (replace all \n with system separator) ? Might be possible and is
probably the best solution for consistencies between platforms. Or maybe it
is better to modify the pattern only in case there is a $ and prefix it by a
\r depending on the platform..ergh..
 
- do we enable switch with system properties from UNIX_LINES to
use-everything-u-can-find and implement this for RegExp and Oro ? Hardly
possible and I'm not volunteering for this one :-). Plus I don't think all
the stuff that does JDK 1.4 is correct strictly speaking.
 
Thoughts ?
 
Stephane

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message