httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Thomas Eckert <Thomas.Eck...@Sophos.com>
Subject Re: [users@httpd] mod_proxy_html, HTML rewrite and content compression
Date Fri, 04 Jan 2013 08:23:58 GMT
On 11/16/2012 05:12 PM, Nick Kew wrote:
> On Fri, 16 Nov 2012 11:31:38 +0100
> Thomas Eckert<Thomas.Eckert@Sophos.com>  wrote:
>
>> Thanks for the hint but unfortunately "manually" adding xml2enc to the
>> filtering chain does not help.
> Looks like you've got problems over and above anything to do with
> your configuration!
>
>>       "SetOutputFilter INFLATE;proxy-html" gets the page displayed correctly
> I thought you said it had charset issues?
>
>
>> [pid 15039:tid 3007834992] mod_xml2enc.c(259): [client
>> 10.10.10.10:40388] AH01434: Charset ISO-8859-1 not supported by libxml2;
>> trying apr_xlate
> That seems implausible.  How do you get a libxml2 install that
> doesn't natively support ISO-8859-1 (latin1)?
>
>> [pid 15039:tid 3007834992] mod_xml2enc.c(490): (22)Invalid argument:
>> [client 10.10.10.10:40388] AH01441: xml2enc: converted 1/1 bytes
>> [pid 15039:tid 3007834992] [client 10.10.10.10:40388] AH01444: Skipping
>> invalid byte(s) in input stream!
>> (and more conversion errors)
> It looks as if your backend incorrectly identifies the charset
> of the page in question.  Either that or you found a bug.
> Do you have a URL where your unprocessed page could be viewed?
>
Sorry for the delay on this. The basic problem remains: If I enable html 
rewriting and connect with a client requesting content compression the 
reverse proxy will fail with a message pointing at libxml2/encoding. I 
can also see different log entries depending on whether I set the 
charset of the page.

So if I just send the page with "Content-Type: text/html" this is what I get

mod_deflate.c(1283): [client 10.10.10.10:39771] AH01398: Zlib: Inflated 
348 to 682 : URL /
mod_xml2enc.c(183): [client 10.10.10.10:39771] AH01430: Content-Type is 
text/html
mod_xml2enc.c(259): [client 10.10.10.10:39771] AH01434: Charset 
ISO-8859-1 not supported by libxml2; trying apr_xlate
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: 
consuming 682 bytes from bucket
mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc: 
converted 682/682 bytes
mod_deflate.c(763): [client 10.10.10.10:39771] AH01384: Zlib: Compressed 
668 to 344 : URL /
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: 
consuming 10 bytes from bucket
[client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder 
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] 
AH01441: xml2enc: converted 1/1 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): [client 10.10.10.10:39771] AH01441: xml2enc: 
converted 9/8 bytes
mod_xml2enc.c(463): [client 10.10.10.10:39771] AH01439: xml2enc: 
consuming 344 bytes from bucket
[client 10.10.10.10:39771] xml2enc_html_entity_fixups(): Transcoder 
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] 
AH01441: xml2enc: converted 4/4 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] 
AH01441: xml2enc: converted 4/3 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:39771] 
AH01441: xml2enc: converted 1/0 bytes
[client 10.10.10.10:39771] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(481): [client 10.10.10.10:39771] AH01440: xml2enc: 
reinserting 334 unconsumed bytes from bucket
[client 10.10.10.10:39771] AH01385: Zlib error -2 flushing zlib output 
buffer ((null))


But if "Content-Type: text/html; charset=ISO-8859-1" is sent this is 
what I get

mod_deflate.c(1283): [client 10.10.10.10:40040] AH01398: Zlib: Inflated 
348 to 682 : URL /
mod_xml2enc.c(183): [client 10.10.10.10:40040] AH01430: Content-Type is 
text/html;charset=ISO-8859-1
[client 10.10.10.10:40040] AH01431: Got charset ISO-8859-1 from HTTP headers
mod_deflate.c(763): [client 10.10.10.10:40040] AH01384: Zlib: Compressed 
668 to 344 : URL /
mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc: 
consuming 10 bytes from bucket
[client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder 
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] 
AH01441: xml2enc: converted 1/1 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): [client 10.10.10.10:40040] AH01441: xml2enc: 
converted 9/8 bytes
mod_xml2enc.c(463): [client 10.10.10.10:40040] AH01439: xml2enc: 
consuming 344 bytes from bucket
[client 10.10.10.10:40040] xml2enc_html_entity_fixups(): Transcoder 
failure (rv=-2)
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] 
AH01441: xml2enc: converted 4/4 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] 
AH01441: xml2enc: converted 4/3 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(490): (22)Invalid argument: [client 10.10.10.10:40040] 
AH01441: xml2enc: converted 1/0 bytes
[client 10.10.10.10:40040] AH01444: Skipping invalid byte(s) in input 
stream!
mod_xml2enc.c(481): [client 10.10.10.10:40040] AH01440: xml2enc: 
reinserting 334 unconsumed bytes from bucket

 From what I can tell, this still seems to be the "wrong" processing as 
the page cannot be inflated correctly at the user's end. Nevertheless 
the message
   AH01434: Charset ISO-8859-1 not supported by libxml2; trying apr_xlate
does not show up anymore. Looking at mod_xml2enc.c +185-194 and +251-268 
that makes sense but would imply the enc detection in +198-206 failed. I 
suggest adding some sort of "failed" debug message in case 
xmlDetectCharEncoding() didn't work as desired.

I've tried a couple more combinations, including using mod_charset_lite 
and different non-latin1 encodings on the backend, but the only thing 
that works is using the Header directive on the backend to set 
"Content-Type: text/html; charset=UTF-8" while leaving the actual 
contents unchanged. Here, "works" means the page is displayed correctly 
at the client's end.

The goal is still to get mod_proxy_html to rewrite the html just like it 
would to with "ProxyHTMLEnable On" but at the same time retaining 
compression support. So setting
  SetOutputFilter INFLATE;proxy-html
which "drops out" the "xml2enc" filter might be problematic.

Unfortunately, the page is not accessible publicly. It is rather simply, 
though, and I made sure there is nothing 'special' on that page - e.g. 
it's just plain ascii, no meta tags, etc.

Note, I tried both "ProxyHTMLEnable On" and "SetOutputFilter 
INFLATE;proxy-html" as filter directives for all above mentioned setups. 
Neither worked except with the mentioned forced UTF-8 header.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message