cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject Re: Fooling around with cocoon davmap
Date Mon, 03 Nov 2003 11:38:49 GMT

On Monday, Nov 3, 2003, at 09:52 Europe/Rome, Sylvain Wallez wrote:

> Unico Hommes wrote:
>
> <snip/>
>
>>>> CallFunctionNode.java ln 166/184:
>>>> // FIXME (SW) : is a flow allowed not to redirect ?
>>>>
>>>> ;-D
>>>>
>>> Uh (again)? I'm wondering if there's not a misunderstanding here: 
>>> this FIXME is about knowing if a flowscript is allowed to terminate 
>>> without stating what page it to be displayed, i.e. check if one of 
>>> sendPage(), sendPageAndWait() or redirectTo() was called.
>>>
>>> Sorry, but I don't see how this relates to HEAD, ETags et al... What 
>>> was the change you proposed to do?
>>>
>>
>> We were talking about the fact that it seemed impossible to serve a 
>> request without also sending an entity body along with the response. 
>> (Short of suppressing the output in the serializer which is more of 
>> hack than a solution). I thought it was allowed to call a flow 
>> function and then not send a page. But apparently was wrong. Stefano 
>> agreed that it should be legal to call a flow function that does not 
>> redirect to a page in order to cover the full range HTTP better.
>>
>> Specifically we were discussing the specification of the OPTIONS 
>> method that prescribes that "the response MUST NOT include entity 
>> information other than what can be considered as communication 
>> options" which seems to exclude sending an entity body from being 
>> such a legal response.
>>
>> I traced the above location as the place the code would need to be 
>> changed in order to achieve this. But I could be wrong.
>>
>
> Sorry to say that, but... yes, I think so ;-)
>
> IMO, this should be handled at the pipeline level, i.e. on a HEAD 
> request, the pipeline should be built and setup, but not executed. And 
> this for several reasons:
> - not every request is handled by flowscript
> - some pipeline components set response headers, such as the i18n 
> transformer or the browser selector.
> - if we use the pipeline key as the Etag (see below), the pipeline 
> must be built and setup to compute that key.

not really. There are many cases in WebDAV/DeltaV/DASL where an HTTP 
request doesn't really generate content.... but needs lots of 
procedural logic to take care of it.

If you think DeltaV, for example, actions such as VERSION, UPDATE, 
CHECKOUT, CHECKIN and so on, don't require you to say anything else 
than a bunch of headers.

In this case, resorting to a pipeline is clearly overkill and we would 
simply like to call a flowscript function that does something, sets a 
bunch of headers and then, simply, terminates without calling any 
pipeline.

I don't know if Unico is right in pointing out that location, but this 
is a different concern: I think the above requirement is a big one and 
if we don't allow this execution, we might result in extermely poor 
performance on webdavapps.

[I've done *extensive* tracing on how webdav/deltav/subversion works on 
the wire... boy, webdav *IS* verbose already and generates tons of 
request/responses.... it is painful to see every 404 having a few 10kb 
of payload... expecially when simply by browsing around, you generate 
tons of it for every PROPFIND]

We must realize that the world of HTTP doesn't stop at GET/POST!!!

> Note that this pipeline-level handling is different from fooling the 
> serializer by sending its output to /dev/null, since the processing 
> chain is setup to get all required information, but not executed.

It seems like a waste of resources to me to setup a pipeline not to use 
it. But, I don't understand... if I have

  <match>
   <call function="blah"/>
  </match>

and then

function blah() {
   cocoon.response.setHeader("DAV:","1");
   // does *NOT* call sendPage*
}

where is the pipeline created?

> Actually, this is not very different from what happens today when 
> content is retrieved from the cache (pipeline is built and setup but 
> not executed).

This is different. The sitemap doesn't know, in advance, that no 
pipeline will be called.

>
>>> BTW, can someone explain me what ETags are about (read that in the 
>>> http RFC a long time ago, but did not really understood at that 
>>> time).
>>>
>>
>> I just looked. It seems entity tags are used as cache validators, 
>> similar to Last-Modified header I guess, i.e. they encode the state 
>> of a resource entity so that clients can optimize network calls by 
>> sending along headers like If-Match, If-None-Match, If-Range, that 
>> are then be checked against the current value of the entity tag on 
>> the server. If they match (or not) the method is executed. At least 
>> that's what I got out of it.
>>
>
> Don't really understand what resource _entity_ means, but it looks 
> like the pipeline cache key could be used for the ETag. What do you 
> think?

Not really. The cache "key" is attached to a "versionable resource", 
the ETag is attached to the "resource entity", means that the ETag is a 
unique and *permanent* identifier for that particular instance of that 
versionable resource. It is like the "version identifier" of that 
particular resource.

For example, using URIs, if

  http://host/path/file

is a versionable resource (means that we keep track of its version.. in 
DeltaV terminology, this is "put under version control"), then

  http://host/path/file/1.0.000

could be its URI and it is equivalent to its ETag... no matter what 
happens in the future, that resource is *immutable*. This is, for 
example, like Subversion works. Note that ETags are very useful for 
proxies: an immutable resource can be cached *forever*.

ETags are also useful for the lost update problem when no locking 
mechanism is in place:

  person A does "GET / HTTP/1.1", obtains the page and an ETag
  person B does "GET / HTTP/1.1", otainss the same page and same ETag
  person A modifies the page (say, in linotype)
  person B modifies the page as well (say, in OpenOffice)
  person A does "PUT / HTTP/1.1", with header "If-Match" and the 
previous ETag
  [the server sees that the ETag is the same, does the saving and the 
ETag is modified]
  person B does "PUT / HTTP/1.1", with header "If-Match" and the ETag 
that got originally
  [the server returns a 409 CONFLICT because the ETag doesn't match]

see http://www.w3.org/1999/04/Editing/ for more info on this

[At this point, it's up to the user-agent software to know what to do]

I'm diving deeper and deeper into this stuff (also because of JSR 170) 
and the more I look into it, the more I think we are generally too 
ignorant on how HTTP really works. HTTP and friends are protocols that 
we use for, say, 5%... everything else is considered black magic and 
reinvented everytime. Normally results in massive performance and 
scalability limitations.

But for Doco, I'm going to spend a serious effort to make things work 
the way the HTTP spec says.... in order to please the HTTPD people and 
in order to show that, no matter what web technology you use, if you 
know how network architectures operate, you scale massively.

But I'm going to fill this gap and, hopefully, influence you people 
back ;-)

And the work on the davmap is, IMO, going to trigger a lot of 
interesting redesigns in the internals.

--
Stefano.


Mime
View raw message