jakarta-oro-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Mark F. Murphy" <ma...@tyrell.com>
Subject Re: End Anchor bug on non-Unix platforms
Date Sun, 10 Jun 2001 13:32:26 GMT
At 7:07 PM -0400 6/8/01, Daniel F. Savarese wrote:
>I'll look at Ed's report this weekend, but it would help if someone
>else could look at it too and offer their view (Takashi? Mark?).  On first
>glance, I don't think it's really a bug because Perl is very particular
>about '\n' being the newline for matching $, not the platform-specific
>file-based end of line delimiter.

I double checked on this and Danial is correct.

I fired up MacPerl... which is known for making use of the Mac's line 
endings when reading files (input separator $/ is 0x0D).

I ran a test under MacPerl to see if it would do the same for regex:

$test1 = "hello\nworld";
$test2 = "hello\rworld";

$result[0] = "Failed";
$result[1] = "Passed";

print "Start test...\n\n";

print "Test1 " . $result[($test1 =~ /hello$/m)] . "...\n";
print "Test2 " . $result[($test2 =~ /hello$/m)] . "...\n";


Results under MacPerl:


Start test...

Test1 Passed...
Test2 Failed...


Results under perl on un


Start test...

Test1 Passed...
Test2 Failed...


I didn't get a chance to test under Win32... but I'd be surprised if 
it worked any different.

>One can debate what the proper way to map this Perl behavior to Java is,
>but I would suggest it is in the I/O stage, not the matching stage.
>In other words, write a class that filters the input stream converting the
>platform-specific end of line representation to a Java newline.  It
>may in fact be that Perl dodges the issue by using ANSI C text-mode I/O
>that does the translation.

The other thing to do is adjust the regex for the particular target file.

When reading the file in, check for line ending type.  Then adjust 
the regex as needed.

So on a Mac I might do the following in perl:

$test1 =~ /hello\r/m


Regex expressions can be built dynamically.  So changing the regex is 
probably less expensive than changing the entire input line or buffer.

mark

-- 
---------------------------------------------------------------------------
  Mark F. Murphy, Director Software Development   <mailto:markm@tyrell.com>
  Tyrell Software Corp                            <http://www.tyrell.com>
  PowerPerl(tm), Add Power To Your Webpage!       <http://www.powerperl.com>
---------------------------------------------------------------------------
  Families Against Internet Censorship:        http://www.netfamilies.org/

Mime
View raw message