Return-Path: Mailing-List: contact oro-dev-help@jakarta.apache.org; run by ezmlm Delivered-To: mailing list oro-dev@jakarta.apache.org Delivered-To: moderator for oro-dev@jakarta.apache.org Received: (qmail 78897 invoked from network); 30 Nov 2000 21:00:28 -0000 Received: from mail.kurion.com (HELO kurion?exch.kurion.com) (216.166.12.5) by locus.apache.org with SMTP; 30 Nov 2000 21:00:28 -0000 Received: by kurion_exch.kurion.com with Internet Mail Service (5.5.2650.21) id ; Thu, 30 Nov 2000 14:59:24 -0600 Message-ID: <81CC73FC2FACD311A2D200508B8B88AA1ADACF@kurion_exch.kurion.com> From: Dan Lipofsky To: "'oro-user@jakarta.apache.org'" , "'oro-dev@jakarta.apache.org'" Subject: bug with CP1252 characters and Perl5Matcher Date: Thu, 30 Nov 2000 14:59:22 -0600 MIME-Version: 1.0 X-Mailer: Internet Mail Service (5.5.2650.21) Content-Type: multipart/mixed; boundary="----_=_NextPart_000_01C05B10.6B1EE160" X-Spam-Rating: locus.apache.org 1.6.2 0/1000/N This message is in MIME format. Since your mail reader does not understand this format, some or all of this message may not be legible. ------_=_NextPart_000_01C05B10.6B1EE160 Content-Type: text/plain; charset="iso-8859-1" Special characters in the CP1252 character set (but not in ISO Latin-1) cause a ArrayIndexOutOfBoundsException deep within Perl5Matcher. The problem occurs if I use "[^.]*\." as the pattern but not if I use "Test" as the pattern. The characters are fancy forms of apostrophe and double-quotes (decimal 146, 147, and 148). Use IE 5 to view the test files to see what they look like. I am running Jakarta ORO 2.0, JDK 1.2.2, WinNT 4.0sp5. Code and test files are attached. Thanks - Dan ------_=_NextPart_000_01C05B10.6B1EE160 Content-Type: application/octet-stream; name="Test.java" Content-Disposition: attachment; filename="Test.java" /* Special characters in the CP1252 character set (but not in ISO Latin-1) cause an error deep within Perl5Matcher. The problem occurs if I use "[^.]*\\." as the pattern but not if I use "Test" as the pattern. The characters are special forms of apostrophe and double-quotes. Use IE 5 to view the test files to see what they look like. Running Jakarta ORO 2.0, JDK 1.2.2, WinNT 4.0sp5. */ import org.apache.oro.text.regex.*; import java.io.*; public class Test { private static void test(String blurb) { try { Perl5Compiler compiler = new Perl5Compiler(); Pattern ptrn = compiler.compile("[^.]*\\."); //Pattern ptrn = compiler.compile("Test"); Perl5Matcher matcher = new Perl5Matcher(); PatternMatcherInput input = new PatternMatcherInput(blurb); while (matcher.contains(input, ptrn)) { System.out.println(input.getMatchBeginOffset() + " " + input.getMatchEndOffset() + " " + input.match()); } } catch (Exception ex) { ex.printStackTrace(); } } private static String readLine(String filename) { try { BufferedReader in = new BufferedReader(new FileReader(filename)); return in.readLine(); } catch (IOException ex) { ex.printStackTrace(); return null; } } public static void main(String[] args) throws Exception { test(readLine(args[0])); System.exit(0); } } ------_=_NextPart_000_01C05B10.6B1EE160 Content-Type: text/html; name="file1.html" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="file1.html" Test 1. It=92s a Test. End Test 1. ------_=_NextPart_000_01C05B10.6B1EE160 Content-Type: text/html; name="file2.html" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="file2.html" Test 2. =93Another Test=94. End Test 2. ------_=_NextPart_000_01C05B10.6B1EE160--