ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jan.Mate...@rzf.fin-nrw.de
Subject AW: ReplaceRegExp task, an I18N problem again in byline="true" ca se, a proposed fix, an example removing whitespace in HTML/JSP
Date Mon, 28 Apr 2003 12:32:46 GMT
What about commiting the patch to BugZilla?
Especially the testcase would be interesting. If there is an I18N datafile
in the sourcetree
regression tests will test this in future.


Jan Matèrne

> -----Urspr√ľngliche Nachricht-----
> Von: matsuhashi@quick.co.jp [mailto:matsuhashi@quick.co.jp]
> Gesendet am: Montag, 28. April 2003 14:25
> An: Ant Users List
> Betreff: ReplaceRegExp task, an I18N problem again in byline="true"
> case, a proposed fix, an example removing whitespace in HTML/JSP
> 
> 
> Hello, let me report an I18N problem of ReplaceRegExp task.
> 
> I use Ant 1.5.3, JDK1.4.1_01,Win2000.
> 
> I ran the following build;
> 
>        <replaceregexp match="&gt;\s*(&lt;td|&lt;th|&lt;/tr)" 
> replace="&gt;
> \1" flags="g" byline="false">
>             <fileset dir="${etc.home}/fit0" 
> includes="**/*.html, **/*.jsp"
> />
>         </replaceregexp>
> 
> where
> 
> (1) the input files given to the task contains NON-ASCII, 2-bytes
> characters
> 
> (2) byline="false" is specified.
> 
> then I got IndexOutofBoundException. This was just the sample 
> phenomenon as
> I reported previously in Jan, this year.
> 
> I looked at the Java source of
> org.apache.tools.ant.taskdefs.optional.ReplaceRegExp ,
> and found that ReplaceRegExp task still has I18N problem when byline
> ="false" is specified.
> In case of  byline="false", it does not use Reader; it is 
> using file.read
> ();
> 
> 
> I tried fixing the problem, and it seems successfull. Here I paste the
> modification.
> I made demo.ReplaceRegExpHacked class, which is a mimic of 
> the original
> ReplaceRegExp.
> In
> ReplaceRegExpHacked#doReplace(RegularExpression,Substitution,S
> tring,int),
> --------------------------------------------------------------
> ------------
> if (byline) {
>                                 .................................
>             } else {
>                 // hacking by K.MATSUHASHI
>                 StringBuffer docbuff = new StringBuffer();
>                 int c;
>                 do {
>                     c = br.read();
>                     if (c >= 0) {
>                         docbuff.append( (char)c );
>                     }
>                 } while(c >= 0);
>                 String buf = docbuff.toString();
>                 String res = doReplace(regex, subs, buf, options);
>                 if (!res.equals(buf)) {
>                     changes = true;
>                 }
>                 pw.print(res);
>                 pw.flush();
> 
>                 //System.out.println("regex=\"" +
> regex.getPattern(getProject()) + "\"");
>                 //System.out.println("subs=\""  +
> subs.getExpression(getProject()) + "\"");
>                 //System.out.println("options=" + options);
>                 //System.out.println("buf.length()=" + buf.length());
>                 //System.out.println("res.length()=" + res.length());
> 
>                 /* the following is the original 
> **********************
>                 int flen = (int) f.length();
>                 char tmpBuf[] = new char[flen];
>                 int numread = 0;
>                 int totread = 0;
> 
>                 while (numread != -1 && totread < flen) {
>                     numread = br.read(tmpBuf, totread, flen);
>                     totread += numread;
>                 }
> 
>                 String buf = new String(tmpBuf);
> 
>                 String res = doReplace(regex, subs, buf, options);
> 
>                 if (!res.equals(buf)) {
>                     changes = true;
>                 }
> 
>                 pw.print(res);
>                 pw.flush();
>                 */
>             }
> 
> --------------------------------------------------------------
> ------------
> 
> 
> After fixing the IndexOutofBoundsException, I could write a 
> useful build
> for removing insignificant whitespaces in HTML/JSP code. Just for your
> interest I paste the build fragment here. The feature of 
> byline="false" is
> effectively used to move <TR>,<TD>,<TH> tags in one line.
> 
> --------------------------------------------------------------
> ------------
>     <!-- ==================== Taskdef 
> ================================ -->
>     <target name="taskdef">
>         <taskdef name="replaceregexphacked"
>                 classname="demo.ReplaceRegExpHacked">
>             <classpath refid="app.classpath"/>
>         </taskdef>
>     </target>    <!-- == HTML/JSP fitness by hacked ReplaceReglarExp :
> level 0  == -->
>     <!--
>         turn a sequence of whitespaces into a blank character
>     -->
>     <target name="fitness0" depends="prepare,compile,taskdef">
>         <copy todir="${etc.home}/fit0">
>             <fileset dir="${web.home}" includes="**/*.html, 
> **/*.jsp"/>
>         </copy>
>         <replaceregexphacked match="\s+" replace=" " flags="g" byline
> ="true">
>             <fileset dir="${etc.home}/fit0" 
> includes="**/*.html, **/*.jsp"
> />
>         </replaceregexphacked>
>     </target>
> 
> 
>     <!-- == HTML/JSP fitness by hacked ReplaceReglarExp : 
> level 1  == -->
>     <!--
>         move &lt;tr&lt; , &lt;td&gt; , &lt;th&gt; tags in one
line
>     -->
>     <target name="fitness1" depends="fitness0">
>         <copy todir="${etc.home}/fit1">
>             <fileset dir="${etc.home}/fit0" includes="**/*.html,
> **/*.jsp"/>
>         </copy>
>         <replaceregexphacked 
> match="&gt;\s*(&lt;td|&lt;th|&lt;/tr)" replace
> ="&gt;\1" flags="gs" byline="false">
>             <fileset dir="${etc.home}/fit1" 
> includes="**/*.html, **/*.jsp"
> />
>         </replaceregexphacked>
>     </target>
> --------------------------------------------------------------
> ------------
> 
> 
> This build shortend a sample HTML file of 26Kbytes to 
> 15Kbytes. This is
> good enough for me.
> 
> -----------------<the original looks like this>
> --------------------------------------
> 
> <table width="630" border="0" cellspacing="0" cellpadding="0">
>           <tr>
>                     <td class="pan"><a 
> href="showMarketReport.do">Test</a>
> &gt;
> </td>
> 
>                     <td align="right" class="datadate">Test</td>
> 
>           </tr>
> </table>
> --------------------------------------------------------------
> ----------------
> 
> 
> 
> -----------------<the artifact looks like this>
> --------------------------------------
> <table width="630" border="0" cellspacing="0" cellpadding="0">
>  <tr><td class="pan"><a href="showMarketReport.do">Test</a> &gt;
> </td><td align="right" class="datadate">Test</td></tr>
> </table>
> --------------------------------------------------------------
> ----------------
> 
> 
> Hope this fix is accomodated within the ReplaceRegExp original.
> 
> 
> 
> 
> 
>      MATSUHASHI,kazuaki
>      Japan
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
> For additional commands, e-mail: user-help@ant.apache.org
> 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message