cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <sylv...@apache.org>
Subject Re: [c3] Conditional GET
Date Thu, 10 Dec 2009 08:07:58 GMT
Reinhard Pötz wrote:

<snip/>

> But let me broaden the picture: Based on our work from about two weeks
> ago, I created another aspect which implements the support for
> conditional GET requests and also takes care that a pipeline isn't
> executed unless it is really necessary. I was also able to fix all
> failing test cases. I created an issue that contains a patch:
> https://issues.apache.org/jira/browse/COCOON3-47
>
> Additionally there is also another feature that I would like to add: The
> current patch only takes care of 'If-Modified-Since' requests. I also
> want to support 'If-None-Match' requests that are based on the 'ETag'
> response header. (see http://en.wikipedia.org/wiki/HTTP_ETag).
>
> Using ETag has the advantage that we could support conditional GET
> requests also in the case where we can't use a timestamp based approach
>  (e.g. when using o.a.c.pipeline.caching.ParameterCacheKey) or to
> provide conditional GET support in REST controllers.
>
> As an ETag value we could use the hash code of a pipeline's cache key.
>   

I don't fully get the context of this conversation, but this last 
sentence triggered a question to me: how can we validate a cache entry 
with its _key_? Looking at the code, I see that CacheKey holds both the 
identifier information (the actual key) and the validity information.

There is a naming issue here which leads to some confusion between key 
and key-and-validity that we can see it in the code: ExpiresCacheKey 
doesn't include the validity information in hashcode() and equals() 
whereas ParameterCacheKey does. What is the right contract?

As a side note, both classes include the class' hashcode in the 
instance's hash code, which means hash codes will be different a every 
JVM restart, or across JVM instances in a cluster, and is likely to 
break persistent and distributed caches.

That being said, I'm wondering if this aggregation of key and validity 
won't cause other kinds of problems with distributed cache 
implementations. For example, Java memcached clients serialize the cache 
key and use this result as the memcache key. If the key includes 
validity information, the memcache key will change every time the 
underlying data changes (e.g. a file's timestamp).

At first sight, this can sound good as it means we will have a cache 
miss when the validity has changed, and will even avoid having to 
compare the validity of cached content. But this can have a desastrous 
impact on the cache efficiency in situations where you have some often 
requested content that changes frequently: the cache will quickly fill 
up with obsolete versions of this content under different key values, 
that will lead older content to be evicted, reducing the overall cache 
efficiency. Whereas a key that's only an indentifier will lead the entry 
to be _replaced_ and not a new one being added.

So in the end, my feeling is that key and validity information really 
should be separated.

Now going back to the ETag discussion, using the pipeline's cache key 
won't work IMHO because of the implementation of some key's hashcode() 
using only the identifier part of the key and not the validity. 
Confusion, I told you ;-)

And BTW, what is the "jmxGroupName" property on CacheKey used for?

Sylvain

-- 
Sylvain Wallez - http://bluxte.net


Mime
View raw message