Return-Path: Delivered-To: apmail-xml-general-archive@xml.apache.org Received: (qmail 54161 invoked by uid 500); 20 May 2001 18:32:28 -0000 Mailing-List: contact general-help@xml.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: list-post: Reply-To: general@xml.apache.org Delivered-To: mailing list general@xml.apache.org Received: (qmail 54139 invoked from network); 20 May 2001 18:32:27 -0000 X-Originating-IP: [63.206.124.79] From: "Lyle Coder" To: Cc: References: Subject: UTF-8 vs. markup Date: Sun, 20 May 2001 11:32:42 -0700 MIME-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 5.50.4133.2400 X-MimeOLE: Produced By Microsoft MimeOLE V5.50.4133.2400 Message-ID: X-OriginalArrivalTime: 20 May 2001 18:32:04.0945 (UTC) FILETIME=[2B339C10:01C0E15B] X-Spam-Rating: h31.sny.collab.net 1.6.2 0/1000/N Hi, This is probobally bit of a general XML question, but I'm using xalan and wanted to know the plus and minus of the following question in xalan too. I'm parsing HTML and constructing a DOM from it. My HTML parser produces UTF-8 data. My question is, when I parse text such as "©" or " "... these have their own UTF-8 (and hence UTF-16) equivalents (for example, the 2 byte sequence in UTF-8). When I'm constructing my DOM, should I use   entity references in my DOM or should I just use the UTF-8 multibyte o UTF-16 2 byte sequences? Please advise Thanks Lyle --------------------------------------------------------------------- In case of troubles, e-mail: webmaster@xml.apache.org To unsubscribe, e-mail: general-unsubscribe@xml.apache.org For additional commands, e-mail: general-help@xml.apache.org