httpd-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Graham Leggett <minf...@sharp.fm>
Subject Re: The Byterange filter -- a new design -- feel free to rip it to shreds
Date Tue, 13 Jul 2004 13:44:50 GMT
Geoffrey Young wrote:

> please take the rest of this as just a friendly discussion - I don't want it
> to turn into some kind of bickering match, since that's definitely not what
> I have in mind :)

Cool no problem - it's quite a complex thing this, and I was struggling 
trying to make it clear what exactly needed to be done and where (and why).

> ok, that isn't the idea I had about output filters at all.  my own concept
> of how this all worked (or should work) is that content handlers are
> supposed to just generate content.  specifically, they should not care at
> all about RFC compliance - this is why we have a separate header filter,
> byterange filter, and so on (and why I think ap_set_last_modified foo should
> be in its own filter ;)

In terms of very simple content handlers, such as a handler that might 
serve content stored in a file on disk, the above is true - it doesn't 
care much about HTTP, that is mostly handled by higher layers.

The problem starts creeping in when the content handler is less trivial 
than the file serving handler, such as mod_proxy, which receives an HTTP 
request from the input filter stack, and returns an HTTP response to the 
output filter stack based on content and headers generated by a backend 
server.

In this case, we're not just feeding content up the stack, but content 
_and_ HTTP headers. Filters cannot ignore the headers, otherwise broken 
behaviour is the result. A classic example is a filter that changes the 
length of the content (mod_gzip, or mod_include). These filters need to 
concern themselves with the HTTP Content-Length header, otherwise a 
response from mod_proxy going up the stack could get shipped to the 
browser with the wrong Content-Length.

In most cases for filters handling the headers is trivial. mod_gzip 
might strip off a Content-Length header in the hope that a filter might 
chunk the response down the line. mod_include should (in the most simple 
case) strip off any Range headers in the request in the hope that the 
byte range filter handles the range request down the line.

But in the case of mod_proxy, mod_jk, etc it is quite valid and very 
desirable for a range request to be passed all the way to the backend, 
in the hope that the backend sends just that range back to mod_proxy, 
which in turn sends it up a filter stack that isn't going to fall over 
because it received a 206 Partial Content response.

> that's true if I'm wrong about the assumption above.  but in my mind, the
> filter API is the most useful if content handlers (and content-altering
> filters) can remain ignorant of 206 responses and the byterange filter can
> bat cleanup.

For simplicity case the above is a noble goal - but one with some 
significant performance drawbacks in many real world applications.

Apart from the mod_proxy case, think of a webserver (or bank of 
webservers) serving content hosted on an NFS server. The entire 650MB 
ISO file (for example) needs to be transferred from the NFS server to 
the webserver for every hit to that file - even when a user is 
continuing a download (which in the case of a file the size of an ISO 
will likely be often).

> sure :)  I guess where we have different ideas, then, is in who exactly
> should be responsible for RFC compliance.  I had always assumed that there
> was (or should be) very little that a content handler needed to worry about
> in this respect, and that it was the job of the core server engine (via
> various early or late-running filters) to take care of things like HEAD
> requests, HTTP/0.9 requests/responses, chunked encoding, range requests, etc.

The above is still true - there is (and should be) very little for the 
content handler to worry about when it comes to HTTP compliance, and 
content handlers should have the option to just generate content, as 
they do now.

The problem though is not with the content handlers but with the filters 
  - filters must not make the assumption that all content handlers only 
serve content and not HTTP headers. When a content handler decides that 
it wants to handle more of the HTTP spec so as to improve performance, 
it should be free to do so, and should not be stopped from doing so due 
to limitations in the output filters.

In other words if mod_proxy is taught how to pass Range requests to the 
backend server, the output filter stack should not stop proxy from doing 
so by removing Range headers unless it is absolutely necessary.

Regards,
Graham
--

Mime
View raw message