Return-Path: X-Original-To: apmail-perl-embperl-archive@www.apache.org Delivered-To: apmail-perl-embperl-archive@www.apache.org Received: from mail.apache.org (hermes.apache.org [140.211.11.3]) by minotaur.apache.org (Postfix) with SMTP id D4F8310569 for ; Thu, 13 Jun 2013 08:35:03 +0000 (UTC) Received: (qmail 84937 invoked by uid 500); 13 Jun 2013 08:35:03 -0000 Delivered-To: apmail-perl-embperl-archive@perl.apache.org Received: (qmail 84554 invoked by uid 500); 13 Jun 2013 08:34:55 -0000 Mailing-List: contact embperl-help@perl.apache.org; run by ezmlm Precedence: bulk list-help: list-unsubscribe: List-Post: List-Id: Delivered-To: mailing list embperl@perl.apache.org Delivered-To: moderator for embperl@perl.apache.org Received: (qmail 66047 invoked by uid 99); 12 Jun 2013 15:05:51 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=5.0 tests=RCVD_IN_DNSWL_NONE,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (nike.apache.org: local policy) Subject: Re: Getting mad with UTF-8 Mime-Version: 1.0 (Apple Message framework v1085) Content-Type: text/plain; charset=iso-8859-1 From: Dirk Melchers In-Reply-To: <51B8893B.9070806@thefreecat.org> Date: Wed, 12 Jun 2013 17:05:23 +0200 Cc: "embperl@perl.apache.org" Content-Transfer-Encoding: quoted-printable Message-Id: <31F88583-BA74-41D9-AE8F-383E0E1F35F8@nureg.de> References: <51B8893B.9070806@thefreecat.org> To: Jean-Christophe Boggio X-Mailer: Apple Mail (2.1085) X-Noris-IP: 62.128.10.225 X-Virus-Checked: Checked by ClamAV on apache.org Hello Jean-Christophe, Am 12.06.2013 um 16:44 schrieb Jean-Christophe Boggio: > Hello, >=20 > Can someone help me understand what could cause this : >=20 > warn "\$content : ".(utf8::is_utf8($content) ? "utf8" : "not utf8"); > warn "\$ticketdata[0]->[0] : ".(utf8::is_utf8($ticketdata[0]->[0]) ? = "utf8" : "not utf8"); > warn "content4=3D$content"; > if ($ticketdata[0]->[0] ne $content) { > warn "content5=3D$content"; > # > warn "content6=3D$content stored=3D".$ticketdata[0]->[0]; > warn "content7=3D$content"; > } >=20 [...] > I guess the problem comes from the fact that on the same line I have = one utf-8 variable and one non-utf8 one. >=20 > $content comes from $fdat{content} (not marked as utf8 while the page = encoding is declared and recognized as utf-8). >=20 > What can I do to force embperl to always set the utf-8 flag on = $fdat{...} ? >=20 > If you know a way of telling Apache/EmbPerl that no encoding other = than UTF-8 exist in the world, I'll take it. And it's not a problem if = I'm incompatible with anything. I guess your guess is right - having one utf8 flagged variable in a = statement converts all other things to utf8 also - and perl uses = ISO-8895-1 for the conversion!=20 So your string is destroyed after that. The same thing happens, when you = use a Freeze::Thaw or a DataDumper - bad for serializing and storing = something in a database :-( Embperl decides for itself, if the %fdat parameters are utf8 or not - I = don't know, how it does so, maybe Gerald could say something about that = - but we had a lot of "funny" things in the past regarding this problem. = Our website is in different encodings (not UTF8 and not ISO-8859-1) so = we ran in the trouble. We implemented an own "thaw" method which tries = to thaw the data and if that fails, it converts the data to utf8 and = thaws it again... A solution for you could be: use "$content=3Ddecode('UTF-8',$content)" = to flag your variable or walk over %fdat to do it with all keys which = are not already utf8-flagged. After that, you should have UTF8-only = variables and everything works as expected. One little additional comment: using non utf8-flagged variables with = utf8-content (as your $content variable) breaks a lot of perl stuff: lc, = uc, cmp, le, gt, length, sort, .... With best regards, Dirk Melchers /// IT/Software-Development /// NUREG GmbH /// Dorf=E4ckerstra=DFe 31 | 90427 N=FCrnberg | Germany Tel. +49-911-32002-256 | Fax +49-911-32002-299 Mobil +49-172-9354670 | www.nureg.de N=FCrnberg HRB 22653 | USt.ID DE 814 685 653 Gesch=E4ftsf=FChrer: Michael Schmidt, Stefan Boas --------------------------------------------------------------------- To unsubscribe, e-mail: embperl-unsubscribe@perl.apache.org For additional commands, e-mail: embperl-help@perl.apache.org