perl-modperl mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From André Warnier>
Subject Re: mod_perl output filter and mod_proxy, mod_cache
Date Thu, 14 Jul 2011 20:07:24 GMT
Tim Watts wrote:
> Hi,
> Is it in theory possible to insert a perl output filter between 
> mod_proxy and mod_cache?
> Or at least between mod_proxy and the client?

> mod_headers and mod_proxy don't seem to play well together and mod-cache 
> doesn't either (probably due to lack of cache control headers in the 
> tomcat response, though I haven't proved this is actually the case).

Back to the main issue.

See this as just a bit more generic information, as to what/how you could think of solving

your problem, apart from the other suggestions already submitted.

1) I am not sure about mod_perl I/O filters, because I never used them. (*)
But in order to (conditionally/unconditionally) insert/delete/modify request/response 
headers, you can also write your own perl handler, and by choosing the appropriate type of

  PerlHandler, you can have it run at just about any point in the request/response cycle.

The real power of mod_perl (if you haven't yet discovered that aspect), is that it allows

you to insert your own code at just about any point of the Apache request processing 
cycle, and to do just about anything you want with any aspect of the request/response.
That includes "interfering" with anything that other, non-perl, Apache modules do.

See the following page for a good overview of the Apache request processing cycle, and 
what you can do with such PerlHandlers :
You are probably more interested in the "HTTP Protocol" section.  By clicking on each item

in that list, you get and explanation of /when/ that type of handle runs.
(It's also indirectly a very good introduction to how Apache itself works).

Such handlers are usually easy to write and configure, and the code to play with HTTP 
headers is also quite simple, if you know what to put in the header(s).

2) about mod_headers and mod_proxy playing together :
The trouble is that (contrarily to the mod_perl documentation above) it is not usually 
clear at all in the Apache module's documentation, to find out during which exact phase of

the Apache request processing each module runs.

But I seem to remember something in mod_headers about an "early" attribute or parameter.
Maybe that tells you more of when it runs (or can run), compared to mod_proxy.

3) In the documentation of mod_proxy, there should be a possibility to configure it inside

of a <Location(Match)> section, instead of "globally" (outside of any section).
That forces you to decide more finely which URLs should or should not be proxied/forwarded

to Tomcat, but it also (in my view) makes it more evident to combine the proxying 
instruction with other modules, like perl filters or handlers.

In effect, from Apache's point of view, mod_proxy must be the equivalent of a 
"content-generating handler" (like a PerlResponseHandler), because for Apache, passing a 
request to mod_proxy for processing is not much different than passing it to any other 
internal response-generating handler.
Apache in fact knows nothing of Tomcat.  It passes a request to mod_proxy, and expects the

response (or an error status) back from mod_proxy.  It has no idea that behind mod_proxy 
is another server.

4) strictly according to the HTTP protocol, a "GET" request should be "idempotent", which

means (roughly) that running it twice or more should always give the same answer.
Which in theory means that even if the GET request goes to a database, the response should

be cacheable under most circumstances.
Unfortunately, the practice is such that the GET request is much overused, and it is not 
always that way.
But if caching the response creates problems, you can always tell your application 
developers that it is their fault because they are misusing the protocol..

(In really strict terms, a GET /could/ provide a different response; but it should not 
modify the state of the server).

5) despite what I am saying in (4), a GET response can very validly be different from a 
previous GET response with the same URL (for example, if in-between the data has been 
modified by a POST).  So if you are forcing headers on the responses, you should at least

be a bit careful not to do this indiscriminately.

That is also why I personally have a doubt about the effectiveness of another caching 
proxy front-end like a couple were mentioned earlier.  If the Tomcat web applications 
themselves do not provide headers to indicate whether their response can be cached or not,

how is the front-end going to determine that this response /is/ the same as a previous one
It seems to me that such a determination would require elements that such a proxy does not

have, no ?

Now if you are still there, one more question :
Are we talking here of a configuration where one front-end Apache front-ends for several 
Tomcats possibly on different machines ?
or does each Tomcat have its own personal Apache front-end on the same machine ?
or something in-between ?

(*) considering the name of "filter" however, I would think that
- an "input filter" should always run /before/ any module which generates content (of 
which mod_proxy is one)
- an "output filter" should always run /after/ any modules which generate content.
So, it is probably difficult to have a filter which runs /in-between/ other Apache modules.

View raw message