Return-Path: Delivered-To: apmail-jakarta-ant-dev-archive@apache.org Received: (qmail 37073 invoked from network); 29 Jan 2002 09:03:22 -0000 Received: from unknown (HELO nagoya.betaversion.org) (192.18.49.131) by daedalus.apache.org with SMTP; 29 Jan 2002 09:03:22 -0000 Received: (qmail 19956 invoked by uid 97); 29 Jan 2002 09:03:31 -0000 Delivered-To: qmlist-jakarta-archive-ant-dev@jakarta.apache.org Received: (qmail 19933 invoked by uid 97); 29 Jan 2002 09:03:30 -0000 Mailing-List: contact ant-dev-help@jakarta.apache.org; run by ezmlm Precedence: bulk List-Unsubscribe: List-Subscribe: List-Help: List-Post: List-Id: "Ant Developers List" Reply-To: "Ant Developers List" Delivered-To: mailing list ant-dev@jakarta.apache.org Received: (qmail 19922 invoked from network); 29 Jan 2002 09:03:29 -0000 Message-ID: <9B3E950CB293D411ADF4009027B0A4D202BA0DA5@maileu.imediation.com> From: Stephane Bailliez To: ant-dev@jakarta.apache.org Subject: Ant Regexp wrappers [Re: multiline mode and platform issues] Date: Tue, 29 Jan 2002 09:08:45 -0000 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2655.55) Content-Type: multipart/alternative; boundary="----_=_NextPart_001_01C1A8A4.8DCE8770" X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N X-Spam-Rating: daedalus.apache.org 1.6.2 0/1000/N ------_=_NextPart_001_01C1A8A4.8DCE8770 Content-Type: text/plain; charset="iso-8859-1" Regexp tests don't pass on Windows in multiline matches because of line separator issues. On Unix: "endofline\n" is matched by "endofline$" On Windows: "endofline\r\n" is matched by "endofline\r$" I asked the question in oro mailing list and Daniel Savarese kindly pointed me to a related post: http://www.mail-archive.com/oro-dev%40jakarta.apache.org/msg00172.html As you will notice, there is a JDK 1.4 issue here that as a default does not behave the same way and eats every line terminator (carriage return, paragraph, next line,...)it knows about: '\n', "\r\n", '\r', '\u0085', '\u2028' '\u2029'. To be consistent with the default Perl implementation (it makes sense, because indeed we might perfectly deal with documents with '\r\n' in Unix and vice versa), we must use the option UNIX_LINES. I'm not sure what would make sense in our case. Results might be inconsistents depending on the platform and regexp engine used. - do we enable UNIX_LINES as a default for JDK1.4 wrapper ? Then what about platform consistencies for build ? That means people must write regexp like "endofline[\\r]$" (what's the line terminator on Mac ?). That would be the easiest way and would make sense if we adhere to the Perl way for now so that all engines are consistent in their behavior. - do we do the above and use a system property to switch from SYSTEM_LINES to UNIX_LINES and do a filtering in (replace all system separator with \n) and out (replace all \n with system separator) ? Might be possible and is probably the best solution for consistencies between platforms. Or maybe it is better to modify the pattern only in case there is a $ and prefix it by a \r depending on the platform..ergh.. - do we enable switch with system properties from UNIX_LINES to use-everything-u-can-find and implement this for RegExp and Oro ? Hardly possible and I'm not volunteering for this one :-). Plus I don't think all the stuff that does JDK 1.4 is correct strictly speaking. Thoughts ? Stephane ------_=_NextPart_001_01C1A8A4.8DCE8770--