ant-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Stephane Bailliez" <sbaill...@apache.org>
Subject Re: Ant Regexp wrappers [Re: multiline mode and platform issues]
Date Tue, 29 Jan 2002 21:39:28 GMT
----- Original Message -----
From: "Stefan Bodewig" <bodewig@apache.org>
[...]
> If you want to apply any magic at any point, there will always be
> situations where the things Ant does are wrong.  Let people deal with
> these problems explicitly themselves (they could use <fixcrl> for
> example).

This is *much* more complicated than I expected...

1) Jakarta Oro works fine and is perfectly consistent.

2) Jakarta RegExp use platform dependant line-separator and the logic is not
correct and will always return false on windows when the \n immediately ends
the string.... bug. see code below:

    /** @return true if at the i-th position in the 'search' a newline ends
*/
    private boolean isNewline(int i) {
//#### will fail here if the string is "end of line\n" since it compares
with "\r\n" size...
        if (i < NEWLINE.length() - 1) {
            return false;
        }

        if (search.charAt(i) == '\n') {
            return true;
        }

        for (int j = NEWLINE.length() - 1; j >= 0; j--, i--) {
            if (NEWLINE.charAt(j) != search.charAt(i)) {
                return false;
            }
        }
        return true;
    }


2) JDK 1.4 does not care about the option UNIX_LINE for $, it seems to only
use it for normal processing of text.. Yay ! :-( Plus it does not process
the next-line character \u0085..argh !
Did not debug it but that's what I can roughly read from the code, the
testcase at the end of this mail does not work AT ALL.

code snippet from JDK 1.4 RC
        boolean match(Matcher matcher, int i, CharSequence seq) {
            if (i < matcher.to) {
                char ch = seq.charAt(i);
                if (ch == '\n' || (ch|1) == '\u2029') {
                    i++;
                } else if (ch == '\r') {
                    i++;
                    if (i < matcher.to && seq.charAt(i) == '\n') {
                        i++;
                    }
                } else {
                    return false;
                }
                if (multiline == false && i != matcher.to) {
                    return false;
                }


I did the following test:

        reg.setPattern("end of text$");
        assertTrue("Windows line separator", !reg.matches("end of
text\r\n"));
        assertTrue("Unix line separator", reg.matches("end of text\n"));
        assertTrue("standalone CR", !reg.matches("end of text\r"));
        assertTrue("next-line character", !reg.matches("end of
text\u0085"));
        assertTrue("line-separator character", !reg.matches("end of
text\u2028"));
        assertTrue("paragraph character", !reg.matches("end of
text\u2029"));
        reg.setPattern("end of text\r$");
        assertTrue("Windows line separator", reg.matches("end of
text\r\n"));

Stephane


--
To unsubscribe, e-mail:   <mailto:ant-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:ant-dev-help@jakarta.apache.org>


Mime
View raw message