ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Chmura" <>
Subject RE: Problem with FixCRLF
Date Wed, 04 Jun 2003 15:46:33 GMT

Well it's a really weird thing this whole Kanji thing.  We get the
content in a word doc or text file.  Do you know how weird this stuff
looks in a text file???

Anywho...  We load it into word so we can at least see it in KANJI, then
we add in the hypertext tags in english.

The whole thing gets processed later through a template system written
in java which we had problems with their also... 

Long story made short is that the solution to not fix the lf based on
charset=SHIFT_JIS being in the content works for now.  It's a band-aid,
but the output seems okay on the brower after the upload.


-----Original Message-----
From: Nathan Christiansen [] 
Sent: Wednesday, June 04, 2003 11:37 AM
To: Ant Users List
Subject: RE: Problem with FixCRLF

The problem with this reasoning is that there are no SJIS characters
that begin with either 0x0D or 0x0A, both of those are in the escape
character range.

If <fixcrlf.../> was not recognizing the character set, none of the
Kanji would come though ok. (Learned after frustrating week.)

You may need to use Microsoft's Code Page 932 (MS932 or Windows
Japanese) instead of SJIS for the encoding.  Shift-JIS is the standard
implementation, and MS932 is Microsoft's extension to the standard.
(Java supports several Japanese character encodings. See

MS Windows uses Microsoft's Code Page 932 (MS932) instead of the
standard Shift-JIS. However, every multi-byte character in SJIS is
encoded the exact same in MS932. MS932 just adds 360 more multi-byte
characters that SJIS reserves for IBM escape sequences.

-- Nathan Christiansen
   Tahitian Noni International

-----Original Message-----
From: W. Sean Hennessy []
Sent: Tuesday, June 03, 2003 4:13 PM
To: Ant Users List
Subject: RE: Problem with FixCRLF

I'll wager the fixcrlf task is not multi-byte char set capable. It might
not distinguish between UTF-8 and UTF-16 encoded files. This means any
combination crlf (0x0D 0x0A) at the byte level is being converted to
just lf (0x0A), hence the corruption of your SHIFT_JIS files whose
single char represented by the two byte combination of 0x0D0A are being
converted to 0x0A by fixcrlf.

-----Original Message-----
From: Bill Chmura []
Sent: Tuesday, June 03, 2003 1:48 PM
Subject: Problem with FixCRLF

On Sun JDK 1.4.1_02 / Ant 1.5.1

I have a task that goes through a directory on a windows box and makes
all the linefeed/Cr into the unix linefeed so when I archive it into a
tar.gz and upload it - its all ready on the other unix end.

Here is the code

<target name="makeunixlf">
<fixcrlf srcdir="${webroot.dir}"
includes="**/*.html, **/*.css, **/*.txt, **/*.sh, **/*.js, **/*.cgi,
**/*.pl, **/*.pm" defaultexcludes="yes"/> </target>

This works great, fantastic, etc... Everything I hoped it would be.

The problem I noticed is that I have some web pages that are in the
SHIFT_JIS (Japanese) character set.  When I run these pages through it
mangles a little bit of the text (enough that I did not notice it at
first).  Now, I cannot read japanese, so it could be converting
everything into huge profanities for all I know.  I do know that the
results before I perform the makeunixlf above are different after.

Traditionally these files have been posted thru FTP, so I am not sure
why the converstion is any different.  It should still be the same

Any ideas?

PS.  The makeunixlf is in a shared library we have for ant, so I cannot
modify it to just exclude certain files unless I pass it in as a
variable to use as a default exclude...



William B Chmura
Director of Internet Technology
Explosivo Internet Technology Group
Tel: (888) 560-YWEB

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message