ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From matsuha...@quick.co.jp
Subject ReplaceRegExp task, IndexOutOfBoundsException, an I18N problem
Date Thu, 23 Jan 2003 08:02:08 GMT
(Prior to this post, I made a similar post with ZIP attachment, which was
unwelcomed by some mail-administrator for security reason. Sorry so much)

Hi, I am a newbie to Ant user ML.

Recently I noticed ReplaceRegExp task has been added at Ant1.5.1. I tried
it and soon found it does not work for a file using Japanese characters.
When I run it, it sais shortly

[replaceregexp] An error occurred processing file: 'filename with Jpanese
char': java.lang.IndexOutOfBoundsException

It works fine for us-ascii files.

I downloaded the Ant1.5.1 source bundle, and looked at ReplaceRegExp.java.
Soon I found that the way the class reads in a file seems not appropriate
for files with 2bytes character encoding.

I made a demo program which reproduces the problem of the original
ReplaceRegExp, and also proposes how to fix it. I show the demo program
at the end of this email.

When you ran the demo.ReplaceRegExpRefactoring class twice from
commandline with us-ascii file at first run, then with Japanese char
 file at the second run, you will see following messages :

*******************************************************************
... invoked doReplaceAsOriginal(etc\data_us-ascii.txt)
----------file------------
japan
--------------------------

... invoked doReplaceI18N(etc\data_us-ascii.txt)
----------file------------
japan
--------------------------

... invoked doReplaceAsOriginal(etc\data_jp-Shift_JIS.txt)
java.lang.IndexOutOfBoundsException
     at java.io.BufferedReader.read(BufferedReader.java:251)
     at
demo.ReplaceRegExpRefactoring.doReplaceAsOriginal(ReplaceRegExpRefactoring.java:45)
     at
demo.ReplaceRegExpRefactoring.main(ReplaceRegExpRefactoring.java:94)
... invoked doReplaceI18N(etc\data_jp-Shift_JIS.txt)
----------file------------
日本
--------------------------
******************************************************************


I used Ant 1.5.1 on Windows2000,JDK1.3.1-05.

I appreciate the ReplaceRegExp task very much. I want to use it to
preprosess JSP files chompping of the whitespaces for code readability in
order to shorten the JSP response message size. My JSP contains lots of
Japanese chars hence the above mentioned problem is significant for me.



     Thank you in advance.

     MATSUHASHI, kazuaki (FAMILY,given)



--------------demo program------------------------------------------
package demo;

import java.io.*;

/**
 * a trivial program to demonstrate a I18N problem of "ReplaceRegExp"
 * of Ant1.5.1
 *
 * jakarta Ant1.5.1 optional task "ReplaceRegExp" seemed to have an I18N
problem.
 * When given with an input file which contains non-ascii characters
(double
 * bytes characters), the ReplaceRegExp fails saying
 * java.lang.IndexOutOfBoundsException occured.
 * By looking at the source code, two points are noticed:
 * 1. it does not use InputStreamReader which is mandatory for character
 * encoding conversion from local encoding to UNICODE.
 * 2. it relies on an assumption that the size of char[] to buffer entire
 * contents of the input file IS equal to the length of the File. This
assumption
 * is valid for ascii characters, but not for double-bytes characters.
 *
 * This code shows a fix of the above mentioned problem.
 *
 * @author MATSUHASHI,kazuaki
 * @date 2002-01-23
 */
public class ReplaceRegExpRefactoring {

    static int BUFFER_SIZE;

    public ReplaceRegExpRefactoring() {
        BUFFER_SIZE = 1000; // size of UNICODE character array, not bytes
    }

    /**
     * a copy cut form the orinal. read a file as the original
ReplaceRegExp.
     * is short for I18N consideration.
     */
    private void doReplaceAsOriginal(File f) throws Exception {
        FileReader r = new FileReader(f);
        BufferedReader br = new BufferedReader(r);
        int flen = (int) f.length();
        char tmpBuf[] = new char[flen];
        int numread = 0;
        int totread = 0;
        while (numread != -1 && totread < flen) {
            numread = br.read(tmpBuf, totread, flen);
            totread += numread;
        }
        String buf = new String(tmpBuf);
        report(buf);
    }

    /**
     * 1. use InputStreamReader for I18N char encoding conversion by JVM.
     * 2. do not mind the size of the file (measurable only in
     *    bytes, not in # of chars). Rather, read chars into fixed size
     *    of buffer and append the contents using StringBuffer.
     */
    private void doReplaceI18N(File f) throws Exception {
        FileInputStream fis = new FileInputStream(f);
        BufferedReader br = new BufferedReader(new InputStreamReader(fis));
        char[] tmpBuf = new char[BUFFER_SIZE];
        int numread = 0;
        StringBuffer sbuff = new StringBuffer();
        while (numread != -1) {
            numread = br.read(tmpBuf, 0, BUFFER_SIZE);
            if (numread != -1)
                sbuff.append(tmpBuf, 0, numread);
        }
        String buf = sbuff.toString();
        report(buf);
    }


    private void report(String s) {
        PrintWriter pw = new PrintWriter(System.out, true);
        pw.println("----------file------------");
        pw.println(s);
        pw.println("--------------------------");
        pw.println();
    }

    static void main(String[] args) {
        if (args.length != 1) {
            System.out.println("Usage: java ReplaceRegExpRefactoring "
                + "<inputfile>");
        }
        else {
            ReplaceRegExpRefactoring instance
                = new ReplaceRegExpRefactoring();
            File f = new File(args[0]);
            try {
                System.out.println("... invoked doReplaceAsOriginal("
                    + args[0] + ")");
                instance.doReplaceAsOriginal(f);
            } catch (Exception e) {
                e.printStackTrace();
            }
            try {
                System.out.println("... invoked doReplaceI18N("
                    + args[0] + ")");
                instance.doReplaceI18N(f);
            } catch (Exception e) {
                e.printStackTrace();
            }
        }
    }
}




--
To unsubscribe, e-mail:   <mailto:ant-user-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:ant-user-help@jakarta.apache.org>


Mime
View raw message