ant-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bill Chmura" <B...@Explosivo.com>
Subject RE: Problem with FixCRLF
Date Wed, 04 Jun 2003 16:04:52 GMT

Okay, in my template program (combination ant, velocity and a custom
task) I set the output to SHIFT_JIS...  I don't know if that changes
things?

Its weird and I know I am probably doing things wrong, but its all could
do.

I can give it a try



-----Original Message-----
From: Nathan Christiansen [mailto:Nathan_Christiansen@tni.com] 
Sent: Wednesday, June 04, 2003 11:56 AM
To: Ant Users List
Subject: RE: Problem with FixCRLF


All Microsoft products will call the encoding that they are using
Shift-JIS, however Java calls the same encoding MS932.

So... 

In your <fixcrlf> task that is doing the Japanese pages (thanks to W.
Sean Hennesy for the code to distinguish) you would be better off using:

<fixcrlf encoding="MS932" .../>

instead of:

<fixcrlf encoding="SJIS" .../>


-- Nathan Christiansen
   Tahitian Noni International
   http://www.tahitiannoni.com


-----Original Message-----
From: Bill Chmura [mailto:Bill@Explosivo.com]
Sent: Wednesday, June 04, 2003 9:47 AM
To: 'Ant Users List'
Subject: RE: Problem with FixCRLF



Well it's a really weird thing this whole Kanji thing.  We get the
content in a word doc or text file.  Do you know how weird this stuff
looks in a text file???

Anywho...  We load it into word so we can at least see it in KANJI, then
we add in the hypertext tags in english.

The whole thing gets processed later through a template system written
in java which we had problems with their also... 

Long story made short is that the solution to not fix the lf based on
charset=SHIFT_JIS being in the content works for now.  It's a band-aid,
but the output seems okay on the brower after the upload.

Phooey



-----Original Message-----
From: Nathan Christiansen [mailto:Nathan_Christiansen@tni.com] 
Sent: Wednesday, June 04, 2003 11:37 AM
To: Ant Users List
Subject: RE: Problem with FixCRLF


The problem with this reasoning is that there are no SJIS characters
that begin with either 0x0D or 0x0A, both of those are in the escape
character range.

If <fixcrlf.../> was not recognizing the character set, none of the
Kanji would come though ok. (Learned after frustrating week.)

<MyCharsetExperience>
You may need to use Microsoft's Code Page 932 (MS932 or Windows
Japanese) instead of SJIS for the encoding.  Shift-JIS is the standard
implementation, and MS932 is Microsoft's extension to the standard.
(Java supports several Japanese character encodings. See
http://java.sun.com/j2se/1.4.1/docs/guide/intl/encoding.doc.html)

MS Windows uses Microsoft's Code Page 932 (MS932) instead of the
standard Shift-JIS. However, every multi-byte character in SJIS is
encoded the exact same in MS932. MS932 just adds 360 more multi-byte
characters that SJIS reserves for IBM escape sequences.
</MyCharsetExperience>


-- Nathan Christiansen
   Tahitian Noni International
   http://www.tahitiannoni.com


-----Original Message-----
From: W. Sean Hennessy [mailto:shennessy@goldenhourdata.com]
Sent: Tuesday, June 03, 2003 4:13 PM
To: Ant Users List
Subject: RE: Problem with FixCRLF


I'll wager the fixcrlf task is not multi-byte char set capable. It might
not distinguish between UTF-8 and UTF-16 encoded files. This means any
combination crlf (0x0D 0x0A) at the byte level is being converted to
just lf (0x0A), hence the corruption of your SHIFT_JIS files whose
single char represented by the two byte combination of 0x0D0A are being
converted to 0x0A by fixcrlf.



-----Original Message-----
From: Bill Chmura [mailto:Bill@Explosivo.com]
Sent: Tuesday, June 03, 2003 1:48 PM
To: user@ant.apache.org
Subject: Problem with FixCRLF



On Sun JDK 1.4.1_02 / Ant 1.5.1

I have a task that goes through a directory on a windows box and makes
all the linefeed/Cr into the unix linefeed so when I archive it into a
tar.gz and upload it - its all ready on the other unix end.

Here is the code

<target name="makeunixlf">
<fixcrlf srcdir="${webroot.dir}"
eol="lf"
javafiles="no"
includes="**/*.html, **/*.css, **/*.txt, **/*.sh, **/*.js, **/*.cgi,
**/*.pl, **/*.pm" defaultexcludes="yes"/> </target>

This works great, fantastic, etc... Everything I hoped it would be.

The problem I noticed is that I have some web pages that are in the
SHIFT_JIS (Japanese) character set.  When I run these pages through it
mangles a little bit of the text (enough that I did not notice it at
first).  Now, I cannot read japanese, so it could be converting
everything into huge profanities for all I know.  I do know that the
results before I perform the makeunixlf above are different after.

Traditionally these files have been posted thru FTP, so I am not sure
why the converstion is any different.  It should still be the same
right?

Any ideas?

PS.  The makeunixlf is in a shared library we have for ant, so I cannot
modify it to just exclude certain files unless I pass it in as a
variable to use as a default exclude...


TIA

Bill



William B Chmura
Director of Internet Technology
Explosivo Internet Technology Group
http://www.Explosivo.com
Tel: (888) 560-YWEB



---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org




---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org


Mime
View raw message