Mailing-List: contact dev-help@cocoon.apache.org; run by ezmlm
Precedence: bulk
Reply-To: dev@cocoon.apache.org
Received-SPF: pass (herse.apache.org: domain of peter.hunsberger@gmail.com
 designates 64.233.182.190 as permitted sender)
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=gmail.com; s=beta;
        h=received:message-id:date:from:to:subject:in-reply-to:mime-version:content-type:content-transfer-encoding:content-disposition:references;
        b=Pyd5g9y4iyim4U503zx4YXxARhwPRvWOsbZB0cW7t306x4z9NkAfpqcsfbxIFpd/VARkWWy3bIZbphmJvtELg8rEK8gOjm8AjDxGK/mOn2x+awf16O1f/d7ou4ZJYWW4rXKkLCHRwWFK4FQLUZMHUblFUxu1Nw955QQrdlcZo4g=
Message-ID: <cc159a4a0702160720x15bc0fay9665f735a27d30c5@mail.gmail.com>
Date: Fri, 16 Feb 2007 09:20:20 -0600
From: "Peter Hunsberger" <peter.hunsberger@gmail.com>
To: dev@cocoon.apache.org
Subject: Re: Postable source and servlet services problem
In-Reply-To: <45D5AB4E.2090102@tuffmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
References: <45D39571.30008@tuffmail.com>
	 <cc159a4a0702150715p31fa11c9u3c9622a4fefa52b8@mail.gmail.com>
	 <45D48D49.30207@tuffmail.com>
	 <cc159a4a0702150914g132b4b48n268b6025d178cb0@mail.gmail.com>
	 <45D4CC8A.2020508@tuffmail.com>
	 <cc159a4a0702151344t80870bt3ef8bc170618f1bb@mail.gmail.com>
	 <45D5AB4E.2090102@tuffmail.com>

On 2/16/07, Grzegorz Kossakowski <grek@tuffmail.com> wrote:

<snip/>

> Firstly, I do not understand how PRG pattern relates to the problem
> whether POST request are cacheable or not.

In this pattern, the implementation that is truest to the original
HTTP standards intention is for every POST request to be answered with
a 303 status response.  The browser in turn issues a GET.  No data is
ever returned directly returned in response to the POST.  From what I
see you don't think there is a problem caching the response to a GET?

> Secondly, in order to get
> full understanding of your arguments I would like to ask you how this
> pipeline would be cached:
> <map:match pattern="post_request">
> <map:generate type="http_post"/>
> <map:transform type="xslt" src="..."/>
> <map:serialize type="xml"/>
> </map:match>
>
> Suppose that we have this http_post generator that parser (as XML) the
> body of POST request. Of course this pipeline will work correctly only
> for POST requests so suppose we have one. My question is:
> How cache key and cache validity object could be created for this kind
> of generator? Could please provide quite detailed description as I would
> like to understand this issue.

It's generated in exactly the same way as any other cache key.  It
depends completely on the internal implementation of the generator and
transformer(s) and what they consider to  affect the cacheability of
the results they produce.

Consider first the case of a search form where no data is present in
the initial presentation of the form.  The only requirement here is
that the form can be uniquely identified with respect to the cache key
(for example "patient.search" to take an example from our system).
Now consider the same form that has been filled out by the user but
that has errors on submission and has to be redisplayed.  The final
result, is generally not cached; the combination of the form and the
data to be presented is essentially a unique instance and there is no
point in caching it.

On the Cocoon side you have a couple of ways to handle this.  In our
case, the basic form (with no data) is generated in exactly the same
way in both cases with the same cache key.  However we aggregate that
result with another generator that generates any data to be presented
within the form.  When no data is present a constant cache key is
generated and a simple SAX wrapper around what is essentially a null
result is cached (which may be used across many different form
combination).  When data is present this particular generator always
returns null cache key and the data is not cached (or the key points
to a validity that will return false for the validity check).  The
results of the cached form and the sometimes cached data now have an
aggregate cache key, in one case it is valid and everything in the
pipeline can be cached.  In the other case the aggregate key is not
valid and the final results of the pipeline are not cached (even
though partial SAX streams inside the pipeline are).  If a user POSTs
an empty search form the pipeline might produce exactly the same
results as the original GET that first generated the form; it's not
the GET or POST that determined the cacheability, it's the data that
was generated in response to them.

There are other use cases where form data can be cached but I hope this helps?

FWIW, we are starting to move away from a standardized HTTP POST
response pattern and implement pure AJAX based forms where the data
exchanges are based on XMLHttpRequest interchanges. This separates the
generation of the form from the data handling completely, however the
basics of caching remain the same: if the pipeline that responds to
the XMLHttpRequest decides that the output can be cached it generates
a  key that uniquely identifies the response.  The same sub pipeline
generates the same results for both a GET, POST and XMLHttpRequest
under the covers, it doesn't care how the  request originated...

> >
> I do not like unclear situations so I will answer your doubts. You have
> said that I was suggesting that _validity objects_ can hang around while
> I was suggesting that _cache keys_ will hang around the reuqest object.
> That's the reason for my "not really". Is it clear now?

Yes, makes sense, I was just reading too quickly...

<snip/>

> >
> > You'd have to automatically add wrapper transformers that worked on
> > the sending and receiving pipelines (you can't require that they
> > inherit from some common abstract classes).  The good  thing is that
> > no new interfaces or components are required and the wrappers would be
> > rather trivial. The bad things is that in some ways this is far more
> > hacky, since as I said, it's essentially magic (it's completely
> > hidden).
> >
> Yes, and I agree there would too much magic here. Also we would have no
> control which components make use of the information stored in PIs. I
> think it's much better to introduce this new interface and have full
> knowledge which components implement it.

I don't like any implementation that completely hides it's workings.
However, having the key information passed about as metadata to the
data stream potentially allows for _any_ SAX data stream to become
part of the final results and still be cached. I can't give you a
concrete use case, but I'm guessing that this could be used for SOAP
and other foreign data stream encapsulation.  Of course if you really
want to have that option then that means some kind of standardized
metadata and that's what standard HTTP headers are all about. So maybe
the "proper" implementation here would be to use completely formed
responses and parse the headers!  That's real work and no longer
trivial with no direct benefit for the moment.  More-over, nothing you
are doing would preclude such an implementation in the future; I could
see some form of standardized SOAP like parser building a cache key
for a foreign data stream that would then be coupled into the pipeline
implementation that you are proposing if need be.

Phew, a lot of discussion, but I think it's important; as Cocoon
separates into discrete blocks we are essentially going to have to
decide how decoupled the blocks are.  Caching often seems to be an
after thought in distributed systems (which is what we will be
building) and it's important to understand the implications of the
design decisions up front.  If you had presented your current proposal
when you originally asked the question I probably wouldn't have even
responded, but continued to have some nagging thoughts about this
issue that I never expressed.  So forgive the rambling, but it helps
me even if it doesn't help you...

-- 
Peter Hunsberger