Mailing-List: contact general-help@xml.apache.org; run by ezmlm
Precedence: bulk
Reply-To: general@xml.apache.org
From: "Lyle Coder" <x_coder@hotmail.com>
To: <general@xml.apache.org>
Cc: <xalan-dev@xml.apache.org>
References: <OF6F8BA2FF.461E45B4-ON85256A4C.000B4528@lotus.com>
Subject: UTF-8 vs. markup
Date: Sun, 20 May 2001 11:32:42 -0700
MIME-Version: 1.0
Content-Type: text/plain;	charset="iso-8859-1"
Content-Transfer-Encoding: 7bit
Message-ID: <OE641JBjbUf9Sp2WEnK00002e7a@hotmail.com>

Hi,
This is probobally bit of a general XML question, but I'm using xalan and
wanted to know the plus and minus of the following question in xalan too.

I'm parsing HTML and constructing a DOM from it.  My HTML parser produces
UTF-8 data.  My question is, when I parse text such as "&copy;"  or
"&nbsp;"... these have their own UTF-8 (and hence UTF-16) equivalents (for
example, the 2 byte sequence in UTF-8).  When I'm constructing my DOM,
should I use &nbsp; entity references in my DOM or should I just use the
UTF-8 multibyte o UTF-16 2 byte sequences?

Please advise

Thanks
Lyle

---------------------------------------------------------------------
In case of troubles, e-mail:     webmaster@xml.apache.org
To unsubscribe, e-mail:          general-unsubscribe@xml.apache.org
For additional commands, e-mail: general-help@xml.apache.org