perl-embperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Michael Boudreau <>
Subject Unicode character entities in form data
Date Fri, 07 May 2004 19:14:20 GMT
This is not strictly an embperl question, but it concerns a web site 
that I'm running with Embperl, and I hope that other web gurus out 
there might have some suggestions.

I have a number of web forms that submit large blocks of text. Often 
the people who fill in these forms compose their text in a word 
processor and then copy and paste it into the web forms.

Sometimes the text submitted via these forms contains Unicode character 
entities, as in the following sample:

    immunopathogenesis may be triggerred through Fas-, TNF-&#61537;- or
    TGF-&#61538;-derived mechanisms

In this case, the string '&#61537;' represents the Greek letter alpha, 
and '&#61538;' represents beta.

My problem is that users of the data submitted via the web want these 
entities translated to something they can understand, but these 
particular entity values come from the "private use area" of the 
Unicode character set, so as far as I know they can't be reliably 

I suspect this problem starts on a Windows system, in which the Greek 
alpha or beta are displayed with the correct glyph on the user's 
screen, but when the text is pasted into the text box in the browser, 
this conversion happens. That's my theory, anyway.

Does anybody else recognize this phenomenon? If so, do you have a way 
to translate character entities that are not defined by Unicode? If 
Microsoft is to blame, as I suspect, do they happen to publish 
somewhere a guide to their character entities?

Any advice would be most welcome.

Michael R. Boudreau
Senior Electronic Publishing Developer
The University of Chicago Press
1427 E. 60th Street
Chicago, IL 60637

To unsubscribe, e-mail:
For additional commands, e-mail:

View raw message