perl-embperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Dirk Melchers <dirk.melch...@nureg.de>
Subject Re: Getting mad with UTF-8
Date Wed, 12 Jun 2013 15:05:23 GMT
Hello Jean-Christophe,

Am 12.06.2013 um 16:44 schrieb Jean-Christophe Boggio:

> Hello,
> 
> Can someone help me understand what could cause this :
> 
> warn "\$content : ".(utf8::is_utf8($content) ? "utf8" : "not utf8");
> warn "\$ticketdata[0]->[0] : ".(utf8::is_utf8($ticketdata[0]->[0]) ? "utf8" : "not
utf8");
> warn "content4=$content";
> if ($ticketdata[0]->[0] ne $content) {
> 	warn "content5=$content";
> 	#
> 	warn "content6=$content stored=".$ticketdata[0]->[0];
> 	warn "content7=$content";
> }
> 

[...]

> I guess the problem comes from the fact that on the same line I have one utf-8 variable
and one non-utf8 one.
> 
> $content comes from $fdat{content} (not marked as utf8 while the page encoding is declared
and recognized as utf-8).
> 
> What can I do to force embperl to always set the utf-8 flag on $fdat{...} ?
> 
> If you know a way of telling Apache/EmbPerl that no encoding other than UTF-8 exist in
the world, I'll take it. And it's not a problem if I'm incompatible with anything.



I guess your guess is right - having one utf8 flagged variable in a statement converts all
other things to utf8 also - and perl uses ISO-8895-1 for the conversion! 
So your string is destroyed after that. The same thing happens, when you use a Freeze::Thaw
or a DataDumper - bad for serializing and storing something in a database :-(

Embperl decides for itself, if the %fdat parameters are utf8 or not - I don't know, how it
does so, maybe Gerald could say something about that - but we had a lot of "funny" things
in the past regarding this problem. Our website is in different encodings (not UTF8 and not
ISO-8859-1) so we ran in the trouble. We implemented an own "thaw" method which tries to thaw
the data and if that fails, it converts the data to utf8 and thaws it again...

A solution for you could be: use "$content=decode('UTF-8',$content)" to flag your variable
or walk over %fdat to do it with all keys which are not already utf8-flagged. After that,
you should have UTF8-only variables and everything works as expected.

One little additional comment: using non utf8-flagged variables with utf8-content (as your
$content variable) breaks a lot of perl stuff: lc, uc, cmp, le, gt, length, sort, ....


With best regards,

Dirk Melchers
/// IT/Software-Development ///

NUREG GmbH ///
Dorfäckerstraße 31 | 90427 Nürnberg | Germany
Tel. +49-911-32002-256 | Fax +49-911-32002-299
Mobil +49-172-9354670 | www.nureg.de
Nürnberg HRB 22653 | USt.ID DE 814 685 653
Geschäftsführer: Michael Schmidt, Stefan Boas


---------------------------------------------------------------------
To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org
For additional commands, e-mail: embperl-help@perl.apache.org


Mime
View raw message