cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Giacomo Pati <giac...@apache.org>
Subject Re: [RT] sharing latest research production (long!)
Date Sun, 04 Mar 2001 17:02:55 GMT
Stefano Mazzocchi wrote:

> > Paul Lamb wrote:
> > I've read you RT twice now and with your comments to Robin it does make
> > sense. It brought back a real deja vu from over a decade ago when I sat
> > through an entire day on the design and implementation of the scheduler,
> > paging, MMU and TLD of the RS/6000 right after it first came out. Very
> > similar thought processes.
>
> I thank you for this. It's a very good compliment.
>
> > My first thought on the formulas is what about when the first retrieval
> > is hugely expensive, but after that it's not at all. I'm not sure where
> > you'd put information like ignore the first x number of accesses.
>
> Oh, no, nothing hugely expensive, the only thing that happens is that
> the frequency of caching will depend on the efficiency, thus, the more
> efficient, the least number of hits to reach nearly optimal caching
> performance, the least efficient, the more hits, but since efficiency
> was slow anyway, the net result is that no huge expense is taken.
>
> Anyway, I plan to write a small "visual fake cache" to show you how the
> cache works, something like a JMeter that estimates efficiency and
> visualizes cache entries and level of efficiencies and you can modify
> parameters at real time to show how it adapts.
>
> I know, it's a toy, but it could give interesting views into the best
> way to visualize efficiency information.
>
> > >From a real-world, non theoretical, perspective the one that worries me
> >
> > is:
> > >       +----------------------------------------------------------+
> > >
> > >       | Result #2:                                               |
> > >       |
> > >       | Each cacheable producer must generate the unique key of  |
> > >       | the resource given all the enviornment information at    |
> > >       | request time                                             |
> > >       |
> > >       |   long generateKey(Enviornment e);                       |
> > >
> > >       +----------------------------------------------------------+
> >
> > Here's the problem I see. I create a hash function that works great, the
> > code goes into production for a year and then some really important
> > person decides that there needs to be a change. The changes are
> > relatively easily made to the producer but nobody thinks to update the
> > hashing function. Now I've introduced the possibility that the wrong
> > data will be pulled from the cache and delivered to someone. I'd really
> > hate to try and track down a bug like this.
> >
> > Secondly, hash functions themselves. For the programmer that's never
> > done one before they can seem rather foreign; I'd wager that most have
> > never even seen one, and very few have had to code one. And from
> > experience it can require lots of testing to make sure it's 100%
> > correct.
> >
> > Am I missing something here? Is this a lot easier than I think?
>
> No, you are totally right, from this perspective, it's a real pain in
> the ass.
>
> But luckily, Ricardo and I thought about a solution for this problem
> (yes, I explained part of this RT to Ricardo weeks ago): re-inversion of
> control.
>
> Look at these interface:
>
>  public interface Cacheable {
>    public void fillPolicy(CachePolicy policy);
>  }
>
>  public interface CachePolicy {
>     public void addDependency(String variableName, Object
> variableValue);
>     public void setTimeToLive(long seconds);
>  }
>
> The only thing you have to provide is the dependencies you have
> (filename, cookie parameter value, time of the day, system load, etc)
> and, if you have it, your time2live.
>
> The cache will then generate the key for you based on all the
> information you provided.
>
> Isn't is smart? :)

Well, we thought is is a little bit overdesigned so here is our smart 
solution :-)) we (Daniel and I) developed (I've announced this prior to my 
vacations)

Imagine a CachableXMLProducer implements the following interface

  public interface CachableXMLProducer {
    Object getKey()
    CacheValidity getValidity();
  }

The getKey method returns an object, that uniquely identifies the universe 
the XMLProducer is in. For a FileGenerator its probabbly the file it is 
parsing. A TraxTransformer returns the name of its stylesheet. Other Producer 
return appropriate Objects. 

Take into accound that the CacheManager will use this Object together with 
the type of Producer to get a unique Key into its CacheStore.

The getValidity returns a CacheValidity objects that has a single isValid() 
method (more of this later on).

The process of determining if a ChacheEntry is still valid goes like this.

The CacheManager asks the CachableXMLProducer to return its key (getKey()). 
This key together with the type of XMLProducer the key came from is used as 
index into a ValidityStore which holds CacheValidity objects gotten so far. 
If there is no CacheValidity object corresponding no CacheEntry has been 
taken so far. If there was a stored CacheValidity object it is passed to the 
CacheValidity object obtained by a call of the getValidity method of the 
CachableXMLProducer:

   Object key = cachableXMLProducer.getKey();
   if ((validity = validityStore.get(key, cachableXMLProducer)) != null) {
       newValidity = CachableXMLProducer.getValidity();
       if (newValidity.isValid(validity)) {
           // previous CacheEntry is still valid
       } else {
           // previous CacheEntry is invalid
       } 


Now lets talk about the CacheValidity object. 

  public interface CacheValidity {
    boolean isValid (CacheValidity validity);
  }

Important is, that the isValid() method of a CacheValidity object can only 
compare to exactly the same type of CacheValidity object. The algorithm used 
by the cache manager above should ensure that only the same type of 
CacheValidity objects are used. This means also that a specific type of a 
CachableXMLProducer can only generate the exact same type of cachevalidity 
object all the times. The design with the CachePolicy of Ricardo and Stefano 
allows this but we think it is not a good idea.

Almost 80% of them will be TimeStampValidity objects as in the case of the 
FileGenerator or TraxTransformer. So a concrete TimeStampValidity class will 
take care of these type of validities. 

You can think of all kinds of validities you need. 

Additionally you can have a ValidityContainer class which can hold several 
Validity objects and return an ANDed value of all the Validity objects as 
well as an OrValidityContainer which ORes those validation results.

We thought this design is very simple to implement and fast as well. It also 
allows every boolean constellation to be implemented to check the validity.

Giacomo

>
> [while the previous RT is all mine, these advanced interface concepts
> were developped by Ricardo and I who deserves full credit for bringing
> new ideas into the old picture]
>
> Please, find attached the document that Ricardo wrote to me that
> describes parts of the cache discussion we had before he left Italy to
> go back to Colombia. It contains some part that are obsoleted by my new
> efficiency algorithm, but the overall concepts still hold and are more
> implementation oriented than accademic so some of you might find them
> more interesting.
>
> Ricardo, do you copy?

----------------------------------------
Content-Type: text/html; charset="us-ascii"; name="caching.html"
Content-Transfer-Encoding: 7bit
Content-Description: 
----------------------------------------

----------------------------------------
Content-Type: text/plain; charset="us-ascii"; name="Anhang: 2"
Content-Transfer-Encoding: 7bit
Content-Description: 
----------------------------------------

Mime
View raw message