cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Grzegorz Kossakowski <g...@tuffmail.com>
Subject Re: Postable source and servlet services problem
Date Thu, 15 Feb 2007 21:11:38 GMT
Peter Hunsberger napisał(a):
> On 2/15/07, Grzegorz Kossakowski <grek@tuffmail.com> wrote:
>> Peter Hunsberger napisał(a):
>> >>
>> >> I think we hit here little design flaw. Transformer is atomic sitemap
>> >> component, but servlet service is pipeline fragment. We try to treat
>> >> pipeline as transformer and this leads to problems like outlined 
>> above.
>> >> From the user and elegance of the sitemap point of view it's 
>> justified
>> >> to treat servlet service as exactly one transformer (or any other
>> >> component) because it's aimed to hide any implementation details 
>> of the
>> >> service and service really does the processing exactly the same
>> >> (semantically) way as transformer. Even though it's technical 
>> problem,
>> >> it *is* serious problem and I have to admit that have no idea how 
>> to fix
>> >> it in a clever, clean way.
>> >>
>> >
>> > I was wondering about this the other day when someone was posting
>> > about implementing proper HTTP 304 status code handling (was that
>> > you?)... My first thought was that you're going to need some kind of
>> > extra metadata passing mechanism for those cases where the consumer or
>> > producer is coupled to another servlet; you don't have all the
>> > information you need to make these kind of decisions in any one spot
>> > anymore.
>
> <snip>GET vs. POST discussion</snip>
>
>> The most problematic issue with POST
>> request is, on the contrary to the GET requests, generating cache keys.
>> We need some short string that will identify uniquely resource included
>> in request body. Computing some kind of hash sum is no way to go (no
>> uniques guaranteed, it costs much), even more, there is no other way to
>> discover the key in the pipeline or pipeline component on servlet
>> service provider's side. It must be included in the request.
>
> I think you're creating an artificial distinction for yourself here.
> Semantically there is no difference between the GET request parameters
> and the POST octet stream from a caching perspective; they both result
> in some pipeline invocation that is determined by their content.
I would defend my stand that there is difference. I agree, that the 
boundary line is fluid (one could encode whole, big file as an request 
parameter) but the main point is that _usually_ request parameters 
determine which resource will be chosen from the space of possible 
choices. The point is, when GET request is processed the information 
about choice itself is available and that enables pipeline to pull 
initial data for pipeline from any source (filesystem, database, other 
host, etc.) and this data is unambiguously identifiable. The space of 
possible choices is usually infinite but restricted, and what's most 
important we do know much about this space. When it comes to POST 
request, we have _only data_ and no information about the space of 
choices. I'd would love to go more into this spaces and define them in 
scientific way as I'm math student but I'm aware that it would reduce 
number of people willing to follow my arguments near to zero. If I'm 
wrong, just let me know. ;-)

In short, request parameters control flow of pipeline and the choice of 
the data to be processed but are not a source of the data. POST's 
request body _is_ a source of the data, that's the difference.
>
>> This way we reached the essence of the problem. Caller must include
>> cache key information, and for our fortune it's available on the
>> caller's side but on the pipeline level, not the component (service
>> transformer) that it's actually calling the service and formulating
>> request. If service caller could have an aggregation of cache keys from
>> the pipeline fragment that occurs before caller of the servlet we would
>> have our problem solved.
>
> Yes, that's why I said: "you don't have all the information you need
> to make these kind of decisions in any one spot anymore."
I wasn't sure what does exactly mean "these kind of decisions".
>
>> I see two possible solutions:
>> 1. Pulling up the service caller to the pipeline level, it would mean
>> introducing some new sitemap syntax/element. I'm not happy with this one
>> so I'm not going to discuss details.
>> 2. Introducing new interface like CacheableKeyAwareProcessingComponent
>> that could have these methods:
>>
>> Serializable getKey();
>> SourceValidity getValidity(Serializable aggregatedKeys);
>>
>> It could be very easy to change pipeline code to make them use new
>> interface if component really implements it. One could say that it's
>> kind of hack, but I have no better option now.
>
> If I understand you, you're suggesting that cache validity objects can
> hang around in the Request Attributes (or some other location that can
> be discovered in a similar fashion) for use across decoupled
> components via these methods?
Not really. I'm suggesting to pass as request attribute/ETag 
header/(insert your favorite) aggregatedKeys (that can be seen as a 
single key because aggregation of keys is also a key) which will become 
a key for ServiceConsumer generator I've described earlier. I think I've 
done enough explanation why this key is needed, but I did not explain 
why it's passed to getValidity method. Well, I'll try to give you an 
example:
<map:generate src="some.xml"/>
<map:transform type="service">
 <map:parameter name="service" 
value="servlet:other_servlet:/some_service"/>
</map:transform>
<map:serialize/>

On the "other_servlet" side:
<map:generate type="ServiceConsumer"/>
<map:transform type="transform1"/>
<map:serialize type="ServiceProducer"/>

Then the flow is something like that:
1. Calculation of cache keys:
<map:generate src="some.xml"/>
returns "FileGenerator:file://../some.xml"

<map:transform type="service">
 <map:parameter name="service" 
value="servlet:other_servlet:/some_service"/>
</map:transform>
returns "ServiceTransformer:servlet:other_servlet:/some_service"

<map:serialize/>
returns "XMLSerializer".

2. Cache validation. We assume here that cache entry has been found, and 
everything is still valid.
<map:generate src="some.xml"/>
returns FileStampValidity

<map:transform type="service">
 <map:parameter name="service" 
value="servlet:other_servlet:/some_service"/>
</map:transform>
got passed aggregated cache keys of preceding components (including it's 
key) so we have call like this:
getValidity("{FileGenerator:file://../some.xml, 
ServiceTransformer:servlet:other_servlet:/some_service}")
the key will get passed to the validity.

<map:serialize/>
returns NOPValidity (always true)

Now we assumed that first validity returns true when asked if valid. 
Second validity will work the similar way as ServletValidity I've 
implemented lately[1]. It will include cache key as the If-Match header. 
Formulated request would look like this one (it's not proper HTTP format 
here):
GET "some_service"
If-Match: {FileGenerator:file://../some.xml, 
ServiceTransformer:servlet:other_servlet:/some_service}
If-Modified-Since: [date obtained from the old validity, see 
ServletValidity for clues]

Requesting GET we ask for information if entry identified by If-Match 
value for "servlet_service" resource was modified since the time 
specified in the If-Modified-Since header. As the HTTP specification 
does not settle what should happen when both If-Match and 
If-Modified-Since are present. So let's define three expected situations:
1. Nothing changed on the service side, all cache entries are present 
and still valid. Not modified (302) status code is returned and that is 
a signal for invoked isValid method to return true. Response body is empty.
2. Something changed but cached resource (the same that was POSTed in 
some previous POST request) for ServiceConsumer is still there so it can 
used to do new processing on it. Ok (200) status code is returned and 
response body contains new content for service transformer. isValid 
method returns false but response body is populated to the service 
transformer.
3. Something changed and there is no cache entry for ServiceConsumer 
(there is no guarantee that it will exist forever). There is no way to 
calculate new result, so response contains Precondition failed (412) 
status code and response body is empty. Precondition failed is returned 
If-Match condition is not satisfied.

In the second and third case normal pipeline processing will take place. 
The difference is that in the second case we don't have to formulate 
POST request because we have already response of it! (nice magic, heh? :))

3. We have all information needed (if valid, if not, processed data 
etc.) so normal processing can happen.

I have forgotten about one quite important thing, the cache key returned 
by ServiceConsumer generator would be:
ServiceConsumer:{FileGenerator:file://../some.xml, 
ServiceTransformer:servlet:other_servlet:/some_service} (yes, value 
obtained from If-Match header field)
ServiceProducer serializer would return meaningless key like:
"ServletProducer"

>
>> > I haven't explored the details but could you generate a
>> > Processing Instruction (PI) that could be used to pass such
>> > information down the pipeline?
>
>> >
>> Where this processing instruction could be generated? I don't understand
>> what does it help us...
>
> It's an alternative to plopping the aggregated cache key information
> into memory as an object.  Instead it is added to the SAX data stream
> as metadata.  This allows complete decoupling but it's also a bit of a
> hack as now you have to recognise the PI and act on it; the result is
> more-or-less magic for anyone that is not aware of the PI and what it
> does.  The nice thing is that a PI can be added any point upstream and
> you can add as many as you want without affecting the final result
> (since any post processing is supposed to ignore any PI that it does
> not recognise).
>
I see. Everything I've written above applies regardless of which way we 
choose to provide this information.
My concern about your solution is where this could be generated and 
injected into the SAX-stream? I think it's obvious that we can't force 
components to include these PI's to the SAX-stream as it's really 
implementation detail. AFAIK pipelines itself do not manipulate 
SAX-streams that go through them, they only determine which components 
should be connected and do actual connection of them.
So I repeat my question: Where this processing instruction could be 
generated and included?

PS. Sorry for so long mail but I would like everyone now and in the 
future aware of changes to the essential parts of the code of Cocoon. 
Moreover, I think neither abstract concepts nor implementation details 
are not dead simple.

-- 
Grzegorz Kossakowski

Mime
View raw message