poi-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Alex Geller ...@4js.com>
Subject Re: Apache POI 3.8 (SXSSFWorkbook) - Unreadable Content
Date Thu, 04 Aug 2011 09:26:26 GMT
On more thought on this. In XML only the following unicode characters are
allowed (see http://www.w3.org/TR/xml/#NT-Char):
Char	   ::=   	#x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
[#x10000-#x10FFFF]
Java strings apparently can contain any character but the 64 reserved "non
characters" (see
http://www.unicode.org/versions/Unicode5.2.0/ch16.pdf#G19635).
This means that there are quite a number of characters (2049 alone between
D7FF and E000) that can cause similar problems. Note that encoding these as
character references is also illegal (See
http://www.w3.org/TR/xml/#dt-unparsed) so there is no solution for this, or
is there? 
All of this is irrelevant for the "non breaking space" character. That
character is legal in both Java and XML. I am actually wondering why it is
even encoded as a character reference.
Regards,
Alex 

--
View this message in context: http://apache-poi.1045710.n5.nabble.com/Apache-POI-3-8-SXSSFWorkbook-Unreadable-Content-tp4658852p4665726.html
Sent from the POI - Dev mailing list archive at Nabble.com.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


Mime
View raw message