httpd-bugs mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From bugzi...@apache.org
Subject DO NOT REPLY [Bug 32730] New: - Error 500 on Non-UTF-8 Encoded PATH_INFO on Windows
Date Thu, 16 Dec 2004 11:57:14 GMT
DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG·
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://issues.apache.org/bugzilla/show_bug.cgi?id=32730>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND·
INSERTED IN THE BUG DATABASE.

http://issues.apache.org/bugzilla/show_bug.cgi?id=32730

           Summary: Error 500 on Non-UTF-8 Encoded PATH_INFO on Windows
           Product: Apache httpd-2.0
           Version: 2.0.52
          Platform: PC
        OS/Version: Windows XP
            Status: NEW
          Severity: normal
          Priority: P3
         Component: Core
        AssignedTo: bugs@httpd.apache.org
        ReportedBy: rd9@donkin.org


Any PATH_INFO string that contains URL-encoded bytes that are not part of a
valid UTF-8 sequence causes Apache 2.0.52 on Windows to give an Internal Server
Error 500, and put the following message in the error.log:

   (22)Invalid argument: utf8 to ucs2 conversion failed on this string:
PATH_INFO=/Main/FromageD\xe9rap\xe9

The URL that generated this was as follows ('view' being the CGI script, no
mod_perl):

   http://localhost:8080/cgi-bin/view/Main/FromageD%E9rap%E9

Bug 9223 is similar to this bug, but not a dupe - it covered the QUERY_STRING
which is mostly not used by the web application (TWiki, http://twiki.org).  As
you'd expect, the following URL works fine:

   http://localhost:8080/cgi-bin/view?topic=Main.FromageD%E9rap%E9

Most Mozilla-derived browsers including Firefox 1.0 generate URLs in the native
character encoding (e.g. ISO-8859-1) by default. In any case, Apache should not
be generating an internal server error, but a less serious error (e.g. file not
found), allowing mod_fileiri or the web application to interpret the encoding
correctly (which TWiki can do as long as it sees the PATH_INFO).

This appears to be Windows specific since TWiki has users of
internationalisation on Apache 2 and Linux - no doubt due to the Unicode on
Windows support.

I realise that such non-UTF-8 URLs are not standards conformant, but if the web
application is willing to handle them specially, I think that Apache should at
least pass them on without trying to convert them (a configuration option to
turn off this conversion would be very useful.)

This bug also prevents use of mod_fileiri, which enables such undesirable URLs
to be redirected to conformant UTF-8 encoded URLs.  As Martin Duerst has
confirmed, this runs in the Apache 'fixup' phase.

For more information and workarounds from a TWiki perspective, see
http://twiki.org/cgi-bin/view/Codev/ApacheTwoBreaksNonUTF8EncodedURLsOnWindows

-- 
Configure bugmail: http://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

---------------------------------------------------------------------
To unsubscribe, e-mail: bugs-unsubscribe@httpd.apache.org
For additional commands, e-mail: bugs-help@httpd.apache.org


Mime
View raw message