httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Nick Kew <>
Subject Re: Revisiting: xml2enc, mod_proxy_html and content compression
Date Tue, 17 Dec 2013 11:47:10 GMT

On 17 Dec 2013, at 10:32, Thomas Eckert wrote:

> I've been over this with Nick before: mod_proxy_html uses mod_xml2enc to do the detection
magic but mod_xml2enc fails to detect compressed content correctly. Hence a simple "ProxyHTMLEnable"
fails when content compression is in place.

Aha!  Revisiting that, I see I still have an uncommitted patch to make
content types to process configurable.  I think that was an issue you
originally raised?  But compression is another issue.

> To work around this without dropping support for content compression you can do
>   SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE
> or at least that was the kind-of-result of the half-finished discussion last time.

I didn't find that discussion.  But I suspect my reaction would have included
a certain aversion to that level of processing overhead in the proxy in these
days of fatter pipes and hardware compression.

> Suppose the client does
>   GET /something.tar.gz HTTP/1.1
>   ...
>   Accept-Encoding: gzip, deflate
> to which the backend will respond with 200 but *not* send an "Content-Encoding" header
since the content is already encoded. Using the above filter chain "corrupts" the content
because it will be inflated and then deflated, double compressing it in the end. 


If the backend sends compressed contents with no content-encoding, doesn't that imply:
1. INFLATE doesn't see encoding, so steps away.
2. xml2enc and proxy-html can't parse compressed content, so step away (log an error?)
3. DEFLATE … aha, that's what you meant about double-compression.
In effect the whole chain was reduced to just DEFLATE.   That's a bit nonsensical
but not incorrect, and the user-agent will reverse the DEFLATE and restore the
original from the backend, yesno?

> Imho this whole issue lies with proxy_html using xml2enc to do the content type detection
and xml2enc failing to detect the content encoding. I guess all it really takes is to have
xml2enc inspect the headers_in to see if there is a "Content-Encoding" header and then add
the inflate/deflate filters (unless there is a general reason not to rely on the input headers,
see below).

Well in this particular case, surely it lies with the backend?
But is the real issue anything more than an inability to use ProxyHTMLEnable
with compressed contents?  In which case, wouldn't mod_proxy_html be the
place to patch?  Have it test/insert deflate at the same point as it inserts xml2enc?

> Of course, this whole issue would disappear if inflate/deflate would be run automagically
(upon seeing a Content-Encoding header) in general. Anyway, what's the reasoning behind not
having them run always and give them the knowledge (e.g. about the input headers) to get out
of the way if necessary ?

That's an interesting thought.  mod_deflate will of course do exactly that
if configured, so the issue seems to boil down to configuring that filter chain.

The ultimate chain here would be:
1.	INFLATE	// unpack compressed contents
2.	xml2enc		// deal with charset for libxml2/mod_proxy_html
3.	proxy-html	// fix URLs
4.	xml2enc		// set an output encoding other than utf-8
5.	DEFLATE	// compress

That's not possible with SetOutputFilter or FilterChain&family, because
you can't configure both instances of xml2enc at once (that's what
ProxyHTMLEnable deals with).  But of those, 4 and 5 seem low-priority
as they're not doing really essential things.

Returning to:
> SetOutputfilter INFLATE;xml2enc;proxy-html;DEFLATE

AFAICS the only thing that's missing is the nonessential step 4 above.

Am I missing something?

Nick Kew
View raw message