httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Igor Sysoev>
Subject Re: mod_proxy Cache-Control: no-cache=<directive> support Apache1.3
Date Sat, 02 Mar 2002 15:39:48 GMT
On Fri, 1 Mar 2002, Graham Leggett wrote:

> Igor Sysoev wrote:
> > mod_proxy can not do many things that mod_accel can. Some of
> > them can be easy implemented, some not.
> Keep in mind that mod_proxy is exactly that - a proxy. It does not try
> to duplicate functionality that is performed by other parts of Apache.
> (This is the main reason mod_proxy and mod_cache were separated from
> each other in v2.0)

mod_accel is not proxy. It's accelarator. It can not work as usual proxy.
I did not even try to implement it - Apache 1.3 is poor proxy. Squid or
Oops are much better.

> > mod_accel can:
> > 
> > *) ignore headers like 'Pragma: no-cache' and 'Authorization'.
> This is the job of mod_headers, not mod_proxy.
> However: ignoring headers violates the HTTP protocol and is not
> something that should be included in a product that claims to be as HTTP
> compliant as possible. If you want to cache heavy data sources, use the
> Cache-Control header correctly, or correct the design of the application
> so as to be less inefficient.

mod_accel can ignore client's 'Pragma: no-cache' and
'Cache-Control: no-cache'. These headers are sent if you press Reload
button in Netscape or Mozilla. By default if mod_accel gets these headers
then it does not look cache but send request to backend.
Webmaster can set 'AccelIgnoreNoCache on' if he sure that
backend did not give fresh data and such requests only overload backend.

As to 'Authorization' mod_accel by default sends this header
to backend and never caches such answers. Webmaster can set
'AccelIgnoreAuth on' if backend never ask authorization but
client anyway send 'Authorization' - so in this case 'Authorization'
is simply very powerfull 'no-cache' header.
I know at least one download utility, FlashGet, that sends in
'Authorization' header name and password for anonymous FTP access.
It's probably bug in FlashGet but this bug effectively trashes cache
and backend.

Yes, of course all these directives work per Location and Files level.

> > *) log its results.
> In theory mod_proxy (and mod_cache) should allow results to be logged
> via the normal logging modules. If this is not yet possible, it should
> be fixed.

In theory but not in practice.

> > *) pass cookies to backend even response can be cached.
> Again RFC2616 dictates how this should be done - proxy should support
> the specification.

As I said mod_accel is not proxy.
By default mod_accel did not send cookies to backend if reponse
can be cached. But webmaster can set 'AccelPassCookie on'
and  all cookies goes to backend. Backend is responsible to
control which answers should be cached and which are not.
Anyway 'Set-Cookie' headers never goes to cache.
This directive works per Location and Files level.

> > *) taking cookies into account while caching responses.
> > 
> > *) mod_accel has AccelNoPass directive.
> What does this do?
> If it allows certain parts of a proxied URL space to be "not-proxied",
> then the following will achieve this effect:
> ProxyPass /blah http://somewhere/blah
> ProxyPass /blah/somewhere/else !
> Everything under /blah is proxied, except for everything under
> /blah/somewhere/else.

Yes. But '!' is already implemented ?
I use another syntax:

AccelPass     /     http://backend/
AccelNoPass   /images  /download  ~*\.jpg$

> > *) proxy mass name-based virtual hosts with one directive on frontend:
> >    AccelPass   /    [PH]
> >    [PH] means preserve hostname, i.e. request to backend would go with
> >    original 'Host' header.
> mod_accel does this in one directive, mod_proxy does it in two - but the
> effect is the same. Should we consider adding a combined directive to
> mod_proxy the same way mod_accel works...?

What are two mod_proxy's directives ?
As far as I know mod_proxy always change 'Host' header.

> > *) resolve backend on startup.
> This is a good idea.

mod_accel does it by default. You can disable it with [NR] flag
in AccelPass directive.

> > *) make simple fault-tolerance with dns-balanced backends.
> mod_proxy does this already.

No. mod_proxy tries it but code is broken. If connection failed it try
to connect with the same socket. It should make new socket.
Anyway mod_accel tries another backend if connection failed, backend
has not sent header, and backend has send 5xx response.

> > *) use timeout when it connects to backend.
> mod_proxy should do this - if it doesn't, it is a bug.

mod_proxy does not.

> > *) use temporary file for buffering client request body (there is patch
> >    for mod_proxy).
> What advantage does this give?

Suppose slow client (3K/s) that POST 10K form. Backend is busy
for 3 seconds. Suppose client uploads 100K file.

> > *) get backend response as soon as possible even it's very big.
> >    mod_accel uses temporary file for buffering backend response if
> >    reponse can not fill in mod_accel configurable buffer.
> This kind of thing is fixed in v2.0 in mod_cache. It is too big an
> architecture change for the v1.3 proxy.

mod_accel can send part of answer to client even backend has not sent
whole answer. But even in this case slow client never block backend -
I use nonblocking operations and select().
Would it be possible with mod_cache ?

> > *) use busy locks. If there are several the same requests to backend
> >    then only one of them would go to backend during specified time.
> > 
> > *) limit concurrent connections and waiting processes on per-backend
> >    or per Location basis.
> This is not the job of mod_proxy, but the job of a separate module.
> Both busy locks and limiting concurrent connections can be useful in a
> normal Apache server using mod_cgi, or one of the Java servlet
> connectors. Adding this to proxy means it can only be used in proxy -
> which is a bad idea.

Probably but Apache 1.3.x has not such module and I needed it too much
in mod_accel.

> > *) mod_accel has mod_randban module that allow to randomize some
> >    part of content. For example it can replace '11111' number in
> >    <img src="http://host/path1?place=1&key=1234&rand=11111">
> >    with random value.
> This is the job of mod_rewrite.

mod_rewrite can not do it.
Suppose we cache some response containing banner's or counter's URL.
If client reload page it gets the same URL in content and did not
load banner. mod_randban change content on the fly.

Igor Sysoev

View raw message