axis-java-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Ias" <iasan...@hotmail.com>
Subject RE: UTF8Encoder question...
Date Tue, 28 Dec 2004 16:53:34 GMT

	From: Jongjin Choi [mailto:gunsnroz@hotmail.com] 
	Sent: Tuesday, December 28, 2004 11:56 AM
	To: axis-dev@ws.apache.org
	Subject: UTF8Encoder question...
	
	
	Dims and all, 
	 
	UTF8Encoder writes escaped string when the character is over 0x7F. 
	The escaping does not seem to be necessary because 
	the Writer (not OutputStream) is used. 
	 
	I think this could be just : (line 86)
	 
	writer.write(character);
	 
	instead of : (line 86 ~ 88)
	writer.write("&#x);
	writer.write(Integer.toHexString(character).toUpperCase());
	writer.write(";");
	 
	The escaping just increases the message size.

Yes, it does. However, I think representing a character of which codepoint
is over 0x7F as a form of &#x XML entity is one of the aims of the encoder
because some systems can't display that character properly due to no
unicode-wide fonts built in there. In case it's 100% certain that every node
in a messaging system has no problem with "as-it-is" character
representation on a XML instance, it must be much more efficient to use a
compact encoder as you pointed out instead of UTF8Encoder. Interestingly,
AbstractXMLEncoder (which is not instantiable) works in such a way. In
consequence, it would be a good idea to create a new encoder to optimize
message size and use it with ease of configurability. (Yes, we can recommend
it to users dealing with non-Latin character systems :-)

Happy new year,

Ias

P.S. I'm going to switch iasandcb@hotmail.com to iasandcb@gmail.com (soon,
very soon).

	 
	If the OutputStream is used, the escaping or UTF-8 conversion (which
existed in old UTF8Encoder.java) will be needed.
	 
	Thought?
	 
	/Jongjin


Mime
View raw message