xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Lyle Coder" <x_co...@hotmail.com>
Subject UTF-8 vs. markup
Date Sun, 20 May 2001 18:32:42 GMT
Hi,
This is probobally bit of a general XML question, but I'm using xalan and
wanted to know the plus and minus of the following question in xalan too.

I'm parsing HTML and constructing a DOM from it.  My HTML parser produces
UTF-8 data.  My question is, when I parse text such as "&copy;"  or
"&nbsp;"... these have their own UTF-8 (and hence UTF-16) equivalents (for
example, the 2 byte sequence in UTF-8).  When I'm constructing my DOM,
should I use &nbsp; entity references in my DOM or should I just use the
UTF-8 multibyte o UTF-16 2 byte sequences?

Please advise

Thanks
Lyle

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org


Mime
View raw message