ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From matsuha...@quick.co.jp
Subject ReplaceRegExp task, an I18N problem again in byline="true" case, a proposed fix, an example removing whitespace in HTML/JSP
Date Mon, 28 Apr 2003 12:25:06 GMT

Hello, let me report an I18N problem of ReplaceRegExp task.

I use Ant 1.5.3, JDK1.4.1_01,Win2000.

I ran the following build;

       <replaceregexp match="&gt;\s*(&lt;td|&lt;th|&lt;/tr)" replace="&gt;
\1" flags="g" byline="false">
            <fileset dir="${etc.home}/fit0" includes="**/*.html, **/*.jsp"
/>
        </replaceregexp>

where

(1) the input files given to the task contains NON-ASCII, 2-bytes
characters

(2) byline="false" is specified.

then I got IndexOutofBoundException. This was just the sample phenomenon as
I reported previously in Jan, this year.

I looked at the Java source of
org.apache.tools.ant.taskdefs.optional.ReplaceRegExp ,
and found that ReplaceRegExp task still has I18N problem when byline
="false" is specified.
In case of  byline="false", it does not use Reader; it is using file.read
();


I tried fixing the problem, and it seems successfull. Here I paste the
modification.
I made demo.ReplaceRegExpHacked class, which is a mimic of the original
ReplaceRegExp.
In
ReplaceRegExpHacked#doReplace(RegularExpression,Substitution,String,int),
--------------------------------------------------------------------------
if (byline) {
                                .................................
            } else {
                // hacking by K.MATSUHASHI
                StringBuffer docbuff = new StringBuffer();
                int c;
                do {
                    c = br.read();
                    if (c >= 0) {
                        docbuff.append( (char)c );
                    }
                } while(c >= 0);
                String buf = docbuff.toString();
                String res = doReplace(regex, subs, buf, options);
                if (!res.equals(buf)) {
                    changes = true;
                }
                pw.print(res);
                pw.flush();

                //System.out.println("regex=\"" +
regex.getPattern(getProject()) + "\"");
                //System.out.println("subs=\""  +
subs.getExpression(getProject()) + "\"");
                //System.out.println("options=" + options);
                //System.out.println("buf.length()=" + buf.length());
                //System.out.println("res.length()=" + res.length());

                /* the following is the original **********************
                int flen = (int) f.length();
                char tmpBuf[] = new char[flen];
                int numread = 0;
                int totread = 0;

                while (numread != -1 && totread < flen) {
                    numread = br.read(tmpBuf, totread, flen);
                    totread += numread;
                }

                String buf = new String(tmpBuf);

                String res = doReplace(regex, subs, buf, options);

                if (!res.equals(buf)) {
                    changes = true;
                }

                pw.print(res);
                pw.flush();
                */
            }

--------------------------------------------------------------------------


After fixing the IndexOutofBoundsException, I could write a useful build
for removing insignificant whitespaces in HTML/JSP code. Just for your
interest I paste the build fragment here. The feature of byline="false" is
effectively used to move <TR>,<TD>,<TH> tags in one line.

--------------------------------------------------------------------------
    <!-- ==================== Taskdef ================================ -->
    <target name="taskdef">
        <taskdef name="replaceregexphacked"
                classname="demo.ReplaceRegExpHacked">
            <classpath refid="app.classpath"/>
        </taskdef>
    </target>    <!-- == HTML/JSP fitness by hacked ReplaceReglarExp :
level 0  == -->
    <!--
        turn a sequence of whitespaces into a blank character
    -->
    <target name="fitness0" depends="prepare,compile,taskdef">
        <copy todir="${etc.home}/fit0">
            <fileset dir="${web.home}" includes="**/*.html, **/*.jsp"/>
        </copy>
        <replaceregexphacked match="\s+" replace=" " flags="g" byline
="true">
            <fileset dir="${etc.home}/fit0" includes="**/*.html, **/*.jsp"
/>
        </replaceregexphacked>
    </target>


    <!-- == HTML/JSP fitness by hacked ReplaceReglarExp : level 1  == -->
    <!--
        move &lt;tr&lt; , &lt;td&gt; , &lt;th&gt; tags in one line
    -->
    <target name="fitness1" depends="fitness0">
        <copy todir="${etc.home}/fit1">
            <fileset dir="${etc.home}/fit0" includes="**/*.html,
**/*.jsp"/>
        </copy>
        <replaceregexphacked match="&gt;\s*(&lt;td|&lt;th|&lt;/tr)" replace
="&gt;\1" flags="gs" byline="false">
            <fileset dir="${etc.home}/fit1" includes="**/*.html, **/*.jsp"
/>
        </replaceregexphacked>
    </target>
--------------------------------------------------------------------------


This build shortend a sample HTML file of 26Kbytes to 15Kbytes. This is
good enough for me.

-----------------<the original looks like this>
--------------------------------------

<table width="630" border="0" cellspacing="0" cellpadding="0">
          <tr>
                    <td class="pan"><a href="showMarketReport.do">Test</a>
&gt;
</td>

                    <td align="right" class="datadate">Test</td>

          </tr>
</table>
------------------------------------------------------------------------------



-----------------<the artifact looks like this>
--------------------------------------
<table width="630" border="0" cellspacing="0" cellpadding="0">
 <tr><td class="pan"><a href="showMarketReport.do">Test</a> &gt;
</td><td align="right" class="datadate">Test</td></tr>
</table>
------------------------------------------------------------------------------


Hope this fix is accomodated within the ReplaceRegExp original.





     MATSUHASHI,kazuaki
     Japan


Mime
View raw message