Return-Path: Delivered-To: apmail-httpd-apreq-dev-archive@www.apache.org Received: (qmail 61074 invoked from network); 7 Oct 2004 22:29:14 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (209.237.227.199) by minotaur-2.apache.org with SMTP; 7 Oct 2004 22:29:14 -0000 Received: (qmail 25831 invoked by uid 500); 7 Oct 2004 22:29:14 -0000 Delivered-To: apmail-httpd-apreq-dev-archive@httpd.apache.org Received: (qmail 25810 invoked by uid 500); 7 Oct 2004 22:29:13 -0000 Mailing-List: contact apreq-dev-help@httpd.apache.org; run by ezmlm Precedence: bulk List-Post: List-Help: List-Unsubscribe: List-Subscribe: Delivered-To: mailing list apreq-dev@httpd.apache.org Received: (qmail 25796 invoked by uid 99); 7 Oct 2004 22:29:13 -0000 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_HELO_PASS,SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (hermes.apache.org: domain of mawic@gmx.de designates 213.165.64.20 as permitted sender) Received: from [213.165.64.20] (HELO mail.gmx.net) (213.165.64.20) by apache.org (qpsmtpd/0.28) with SMTP; Thu, 07 Oct 2004 15:29:13 -0700 Received: (qmail 30791 invoked by uid 65534); 7 Oct 2004 22:29:10 -0000 Received: from dsl-082-082-185-090.arcor-ip.net (EHLO [82.82.185.90]) (82.82.185.90) by mail.gmx.net (mp012) with SMTP; 08 Oct 2004 00:29:10 +0200 X-Authenticated: #20142289 Message-ID: <4165C338.8010203@gmx.de> Date: Fri, 08 Oct 2004 00:29:12 +0200 From: Markus Wichitill User-Agent: Mozilla Thunderbird 0.8 (Windows/20040913) X-Accept-Language: en, de MIME-Version: 1.0 To: Boris Zentner CC: apreq-dev@httpd.apache.org Subject: Re: Apache::Request, APR::Table and UTF8 References: <455F1FE1-1139-11D9-A745-000D9331B488@2bz.de> <87is9p16ht.fsf@gemini.sunstarsys.com> <6E61FA29-16EC-11D9-BC00-000A95B9602E@kineticode.com> <95D96DBE-171F-11D9-9D34-000D9331B488@2bz.de> <416326AF.8040502@stason.org> <46F5EF7F-1725-11D9-B147-000A95B9602E@kineticode.com> <4163346D.8000501@stason.org> <10D4B4DB-172C-11D9-B147-000A95B9602E@kineticode.com> <41633F79.6080003@stason.org> <87k6u4x7c0.fsf@gemini.sunstarsys.com> <4164BC19.4030802@stason.org> <48369F3D-187D-11D9-A063-000A95B9602E@kineticode.com> <41658ADE.6080407@gmx.de> <41658FE1.5050502@modperlcookbook.org> In-Reply-To: X-Enigmail-Version: 0.86.1.0 X-Enigmail-Supports: pgp-inline, pgp-mime Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked X-Spam-Rating: minotaur-2.apache.org 1.6.2 0/1000/N Boris Zentner wrote: >>> A simple form of UTF-8 support in Apache::Request that I wouldn't mind >>> would be a flag "DECODE_UTF8 => 1", that when passed to new() would >>> cause Apache::Request to call utf8::decode on the returned string every >>> time param(), body() etc. is called, but would leave the original > > This is not possible. The reason is that if you call decode for every > parameter, *ALL* parameters must be in utf8. That is not true. As I said, I know this simple all-or-nothing approach is not the solution you want, but it's certainly "possible" and a solution that would be enough for most applications. I don't know what kind of mixed input your application receives in a single request, but in the common case of a browser-submitted form, it's either all UTF-8 or not, depending on the encoding of the HTML page (or maybe the accept-charset attribute, if any browsers support that). My stance here is that your application with mixed input is rather specific if not unusual, and therefore needs to do its own handling of the issue, even if that means you have to do more wrapping/subclassing and have to educate your co-developers about how to use the resulting interfaces. Which you already did. A light-weight library like apreq should not try to handle every possible format under the sun, that's why I've already cautioned against linking in full XML support. > Also I can think of the DECODE_UTF8 flag from your example as a > utf8-flag-on-for-all parameters in the table. Technically utf8::decode doesn't set the flag for 7-bit strings. > I understand, that APR::Table can not change it's current behavior. But > Apache::Request can and should. I'm really frightened that so much > emails and examples do not convert you all on how important correct data > for perl is. We seem to mainly differ about whose responsibility it is to handle the UTF-8 issue, the developer's who has all the relevant info, or that of a thin XS layer over Apache internals, which doesn't know anything about the context of the incoming data. BTW, I think you make things look a bit too simple in your first post by hiding much of the real complexity behind $something and $something_else. I'm still not sure if you expect Apache::Request to do heuristic scanning of the input to automatically determine the encoding (which would be unreliable at best)? > Again, no conversion just back what I put in that is the minimum > requirement for any data-store. Stop thinking of those tables as data stores, they're APIs to a webserver that doesn't really support Unicode. Of course I might be biased, since I've always treated Apache::Request as a read-only object anyway.