xml-general mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Brian Dupras <bri...@centera.com>
Subject Character set problems
Date Mon, 06 Mar 2000 22:28:21 GMT
Hello all - we're currently writing a system that uses xml/java/xerces on an
NT platform, communicating with xml/c/expat on a unix platform, and we're
hitting the wall of character set incompatability.

The idea is that the end-user will use an HTML form to enter information
(that may contain reserved xml characters like apostrophes, quotes, GTs,
LTs, ampersands, etc).  This "user-text" then gets packaged into an XML-DOM
document on the xml/java/xerces/NT server, streamed out to a java string,
sent over a CORBA link, destreamed by xml/c/expat/unix into a DOM,
processed, and finally streamed into a database.  On the reverse flow, the
"user text" is streamed from the database into the xml/c/expat/unix
processor's DOM, streamed out to a string, sent over CORBA, loaded into a
Java string, then a Xerces DOM, processed, streamed out to a string and sent
out to a web browser.

Input flow:
web client form => xerces/java/NT => c/expat/unix  => database

Output flow:
database => c/expat/unix => xerces/java/NT => web client HTML

When the user inputs characters via web form, what should we do to the text
stream before creating a xerces DOM?  What should we do before creating a
text stream from Xerces?

When the user requests the info back, what should we do to the text stream
from the data processing layer before creating a Xerces DOM?  What should we
do before sending out to the browser's HTML stream.

The idea is that the user can enter flat text, or even markup.  But as we're
not that any markup is valid XML, it must be treated completely opaque.
However, we've run into trouble with things like user-text:"&nbsp;" being
munged between the web and dataprocessing tiers into some funky
international A character.

Any advice on this matter is much appreciated.

Brian Dupras
Centera Information Systems, Inc.
phone 303.381.4420 (direct)
phone 303.939.0200 (operator)
fax	303.939.0111
web	http://www.centera.com
email	briand@centera.com

View raw message