Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cocoon.apache.org
To: dev@cocoon.apache.org
From: Sylvain Wallez <sylvain@apache.org>
Subject: Re: Improving HTTP protocol handling (Was: RE: Fooling around with
 cocoon davmap)
Date: Mon, 03 Nov 2003 12:45:15 +0100
Lines: 104
Message-ID: <bo5f4b$iv1$1@sea.gmane.org>
References: <84F0A43A4248CE45B5C0E20F4C40779C667C7E@naomi.webworks.nl>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;
 rv:1.5) Gecko/20031007
In-Reply-To: <84F0A43A4248CE45B5C0E20F4C40779C667C7E@naomi.webworks.nl>
Sender: news <news@sea.gmane.org>

Unico Hommes wrote:

<snip/>

>>IMO, this should be handled at the pipeline level, i.e. on a HEAD request, the pipeline should be built and setup, but not executed. And this for several reasons:
>>- not every request is handled by flowscript
>>- some pipeline components set response headers, such as the i18n transformer or the browser selector.
>>- if we use the pipeline key as the Etag (see below), the pipeline must be built and setup to compute that key.
>>    
>>
>
>Good point, we need to do that too, but not having to send a page from the flow could also help us in other situations where we don't need access to the pipeline. Think OPTIONS, TRACE, MKCOL, PUT, etc. Or do you think these should also be handled at the pipeline level?
>  
>

HEAD is a bit special here since it can be considered as a 
"stripped-down" version of GET and as such doesn't require special 
application-level handling.

Other methods need to trigger some application-specific behaviour that 
must handled somehow. But I see your point: some methods don't ask for a 
response body. We currently have no way to express this, as the sitemap 
engine throws a RNFE (and hence a 404) if no pipeline was built.

To express this body-less response, several solutions come to mind:
- have a "null-reader" that allows building a pipeline that sends nothing
- have some new method on environment stating that no body is to be 
produced. But this require a new sitemap statement.
- redirect to a special protocol ("null-body:"?) that indicates a 
body-less answer.

The two first solutions have the drawback of requiring some matching in 
the pipeline just to say that we don't want to generate a response body. 
This is useless (and CPU consuming) if the request handling is done in a 
flowscript.

The third solution (redirect) has the advantage of not adding a new 
sitemap statement and be available at no extra cost from a flowscript 
(or an action). But it sounds a bit hacky.

What do you think?

>>Note that this pipeline-level handling is different from fooling the serializer by sending its output to /dev/null, since the processing chain is setup to get all required information, but not executed.
>>
>>Actually, this is not very different from what happens today when content is retrieved from the cache (pipeline is built and setup but not executed).
>>    
>>
>
>OK. Are you saying then that the pipelines should be handling more low level HTTP methods? Or do you see some other specialized component handling this?
>  
>

Maybe just HEAD (see above).

>>>>BTW, can someone explain me what ETags are about (read that in the http RFC a long time ago, but did not really understood at that time).
>>>>        
>>>>
>>>I just looked. It seems entity tags are used as cache validators, similar to Last-Modified header I guess, i.e. they encode the state of a resource entity so that clients can optimize network calls by sending along headers like If-Match, If-None-Match, If-Range, that are then be checked against the current value of the entity tag on the server. If they match (or not) the method is executed. At least that's what I got out of it.
>>>      
>>>
>>Don't really understand what resource _entity_ means, 
>>    
>>
>
>   "entity
>      The information transferred as the payload of a request or
>      response. An entity consists of metainformation in the form of
>      entity-header fields and content in the form of an entity-body, as
>      described in section 7."
>  
>

Ah, ok. So nothing new, actually ;-)

>>but it looks like the pipeline cache key could be used for the ETag. 
>>What do you think?
>>
>
>I think so. The spec talks about weak and strong entity tags. I would say the pipeline cache key qualifies as a weak one. Weak keys only approximate semantic equivalence whereas strong keys reflect the verbatim response.
>

So strong keys can be e.g. the MD5 signature of the response body?

>Because although the pipeline output may stay the same it doesn't include information about the values of the response headers, and because validity object the pipeline gets from the pipeline components doesn't state the content wouldn't be different if it would execute the pipeline again, just that it shouldn't execute the pipeline.
>  
>

Mmmh... If this isn't true, then we have a serious problem, because the 
pipeline is not executed if the validity is valid. Or did I missed 
something?

Also, the rule for pipeline components should be that entity-header 
related headers (e.g. Vary of browser selector) should be set at 
pipeline setup time while entity-body related headers (e.g. 
content-length) should be set at pipeline execution time.

Sylvain

-- 
Sylvain Wallez                                  Anyware Technologies
http://www.apache.org/~sylvain           http://www.anyware-tech.com
{ XML, Java, Cocoon, OpenSource }*{ Training, Consulting, Projects }
Orixo, the opensource XML business alliance  -  http://www.orixo.com