ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Radomír Věncek <radomir.ven...@funwap.net>
Subject National characters filtering problem
Date Fri, 11 Jun 2004 09:40:03 GMT
Hi all!

 

I’m using the <copy> task to copy and “nationalize” java source codes of J2ME
application by replacing normal @patterns@. This process is working properly for most of different
national characters.

 

But some of them are encoded incorrectly.

Just now I want to make Greek port of my application. And here is the problem – some characters
are filtered incorrectly.

 

I have normal java source:

 

      public static final String [] cmdsLList =

      {

            "@loc.cmd.ok@",

//          "@loc.cmd.back@",

            "Î ÎŻĎ�ω", // direct greek text „Πίσω“ (binary: ce a0  ce af
 cf 83  cf 89)

            "@loc.cmd.pause@",

            "@loc.cmd.select@"

      };

 

I have filtering file stored by notepad in UTF-8 encoding:

 

loc.cmd.ok = OK

loc.cmd.back = Πίσω

loc.cmd.pause = Παύση

loc.cmd.select = Επιλέξτε

 

And this is the build.xml part copying and nationalizing the source code:

 

      <copy todir="Temp/Src" flatten="yes" filtering="true" includeemptydirs="false">

                  <filterset begintoken="@" endtoken="@">

                        <filtersfile file="Temp/Locales/${bld.loc}.txt"/>

                  </filterset>

<fileset dir="Src" includes="**/*"/>

</copy>

 

And the result is following:

Original java source:

 

            {

                        "@loc.cmd.ok@",

//                      "@loc.cmd.back@",

                        "Πίσω",

                        "@loc.cmd.pause@",

                        "@loc.cmd.select@"

            };

 

Copied and filtered java source:

            {

                        "OK",

//                      "ΠίÏ?ω",

                        "ΠίÏ?ω",

                        "ΠαύÏ?η",

                        "Επιλέξτε"

            };

 

I can only hope that the characters here in mail are displayed correctly.

If not – please look at http://radarada.wz.cz/problem.html - copy of this mail.

 

The problem is in “loc.cmd.back” string (and ofcourse more more other) in the σ (sigma)
character whis is encoded in UTF-8 as byte sequence 0xcf 0x83, but after filtering the result
file contains bytes 0xcf 0x3f which are not interpreted and displayed correctly.

 

BUT the problem occurs also if I directly include the Greek text into the original java source
– this text should NOT be filtered, only copied. But also in this case the binary representation
of the sigma character differs (the “?” character instead of “ƒ”).

 

If I comment the filterset task, the file is normaly copied and no change occurs (ofcourse
the @patterns@ are not replaced) and the Greek text is displayed correctly.

 

WHY???

 

 

I’m working on Windows XP Professional version 2002 with Service Pack 1 and I’m using
ant 1.5.3.

 

Regards,

   Rada

 

Mime
  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message