httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Micha Lenk <>
Subject Re: [PATCH] mod_xml2enc eats end of file
Date Mon, 05 Nov 2012 09:57:28 GMT
Hi Nick,

On 11/02/2012 07:25 PM CEST +02:00, Nick Kew wrote:
>> just debugged a case where Apache used as reverse proxy filters a
>> text/javascript file through mod_proxy_html and mod_xml2enc. As
>> mod_proxy_html sees no business in filtering that file, it removes
>> itself from the filter chain, but mod_xml2enc still tries to do its job.
> That looks like a logic bug you've found!

yes, that's also possible.

> It looks like an edge case: one you'll only see when the charset coming
> from the backend is not supported by libxml2 on your platform, so that
> mod_xml2enc converts it using apr_iconv.

No, not exactly this edge case. The backend server sends the response
header "Content-Type: text/javascript", i.e. without any information
about the used charset. From what I've seen in GDB, mod_xml2enc seems to
resort to assume that the server sends ISO-8859-1 and without an error
converts that to UTF-8 (even though in fact it seems to be mixed
ISO-8859-1 and UTF-8).

>> The attached patch based on httpd-trunk fixes that issue by removing the
>> Content-Length header entirely. Please review it. I would appreciate it,
>> if it could get applied to trunk and then backported to the httpd-2.4.x
>> branch.
> Your patch fixes the immediate bug (thanks!), but the fact that
> mod_xml2enc is doing anything at all in the case you describe is a
> bigger bug.

Ok, I too wondered about mod_xml2enc staying active being a bug, but was
not sure. So I only fixed the immediate bug. If you consider this a bug,
I assume the Content-Type check just needs to be unified. In
mod_proxy_html check_filter_init() checks for "text/html" or
"application/xhtml+xml", whereas in mod_xml2enc xml2enc_ffunc() checks
for prefix "text/" or "xml" anywhere in the content type string. This is
not consistent, and causes mod_proxy_html to skip text/javascript (or
text/css) files, while mod_xml2enc takes them.

> There's no easy solution: mod_proxy_html delays some of the checks
> until it has a first chunk of data, to allow for cases where an earlier
> filter (e.g. XSLT) might affect Content-Type.  But by that time it's
> too late to insert or uninsert the xml2enc filter, as that needs to go
> in front of the proxy_html filter.

Yes, the delayed checks also seem necessary for the charset guessing in
case no charset is specified.

But what about making the Content-Type check consistent?


View raw message