httpd-users mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier ...@ice-sa.com>
Subject Re: [users@httpd] mod_ext_filter cmd output is garbage
Date Wed, 21 Oct 2009 22:32:58 GMT
I'll continue to top-post..

I really don't know enough about how mod_proxy handles things in the 
forward direction, to be able to help more.  I see a mod_deflate 
somewhere in your log, indicating that some compression is taking place, 
but I don't know if that's before or after your filter comes into play.

Only one thing : the response from the remote server may be text/html 
(in the Content-type header), but it may also be compressed (as per the 
Transfer-encoding header). I don't know if mod_proxy, per se, would 
always decompress it before passing it to, or through, your filter.
If not, then your filter may be seeing alternatively uncompressed and 
compressed html pages; and your example sed filter may just have been 
"lucky", and happened to run only on uncompressed stuff.

For the charset and encoding, you have to look at the possible "charset" 
attribute in the Content-type.  There also, you may have been lucky : 
characters in the strict US-ASCII printable range have the same encoding 
in iso-8859-1 (the default in http) as in UTF-8 (a single byte per 
character, and the same value indidentally).  But if you ever got html 
pages with these funny accented non-English characters, that would no 
longer be the case, and your s/foo/bar/ stuff may create a real mess.

And we haven't even started talking about chunked encoding here...

All in all, for this kind of usage, and supposing that all you are 
trying to do is to add some kind of footer or so to incoming html pages, 
even for that you would really need to
a) parse the incoming html into some kind of memory structure
b) insert your stuff where appropriate in the structure
c) re-assemble the html before forwarding it to the client
I am not sure that the investment to do that is really worth it for your 
expected benefit.

By the way, have you looked at :
http://httpd.apache.org/docs/2.2/mod/mod_substitute.html
(but I'm not sure even that one takes charsets into account).
and maybe also
http://httpd.apache.org/docs/2.2/mod/mod_charset_lite.html

I also remember vaguely that there was a module which really allowed to 
modify html content on the way out, but I don't find it in the list of 
standard Apache 2.2 modules.


Marcos Mendez wrote:
> Yes absolutely. I've setup a forward proxy, where I have to open a
> port (8080) for people to use it. I've set the filter type to
> text/html. So I guess it's definately an encoding issue. Any way how
> to solve that? Strangely enough, the sed filter examples work no
> matter what. So I don't understand why this doesn't.
> 
> I'm including the log output for the request...
> 
...

> [Wed Oct 21 17:25:31 2009] [debug] mod_ext_filter.c(628): [client
> 172.16.1.199] filtering `http://skyblender.com/' of type `text/html'
> through `/etc/apache2/simple.php', cfg ExtFilterOptions DebugLevel=10
> NoLogStderr !PreserveContentLength ExtFilterInType text/html
> ExtFilterOuttype (unchanged)
...
> [Wed Oct 21 17:25:31 2009] [debug] mod_deflate.c(619): [client
> 172.16.1.199] Zlib: Compressed 531 to 362 : URL http://skyblender.com/
> [Wed Oct 21 17:25:31 2009] [debug] mod_proxy_http.c(1807): proxy: end body send
> [Wed Oct 21 17:25:31 2009] [debug] proxy_util.c(2009): proxy: HTTP:
> has released connection for (*)
> 


---------------------------------------------------------------------
The official User-To-User support forum of the Apache HTTP Server Project.
See <URL:http://httpd.apache.org/userslist.html> for more info.
To unsubscribe, e-mail: users-unsubscribe@httpd.apache.org
   "   from the digest: users-digest-unsubscribe@httpd.apache.org
For additional commands, e-mail: users-help@httpd.apache.org


Mime
View raw message