cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <stef...@apache.org>
Subject [RT] Adaptive Caching
Date Sun, 13 Jul 2003 01:39:01 GMT
NOTE: this is a refactoring of an email I wrote 2 1/2 years ago. The
original can be found here:

   http://marc.theaimsgroup.com/?l=xml-cocoon-dev&m=98205774411049&w=2

I re-edited the content to fit the current state of affairs and I'm
resending hoping to trigger some discussion that didn't happen in the 
past.

                         ---------- o ----------

WARNING: this RT is long! and very dense, so I suggest you to turn on
your printer. Also, keep in mind this is comes from hard-core accademic
research and you might find it 'too theorical' for your practical taste.
If it's the case, get over it! since it's only when you fly high in the
abstraction that you percieve the borders of your problem space. You'll
find some math formulas, but I've avoided all deep statistical proofs
that don't really belong here.

At the end, I'd like to warn you about the fact that some of the
hypothesis (thus the results) have not been proven in real life, but are
just whiteboard speculations (even if very the most reasonable I could
find!). So it is entirely possible that my general abstractions
represent, instead, a very specific concern area that is barely useful
in real life. I *strongly* believe this is not the case, but only an
implementation can prove me right or wrong.

I'd like everyone of you to concentrate on the facts and on the results,
applying this new information on the concepts you already have in mind
and see how these change the picture or how they are found wrong in your
particular concern area.

As usual, all feedback will be welcome.

                         ---------- o ----------

While there is no reason to say that the Cocoon architecture is
inherently slow, the operations performed on the content are generally
heavy and time consuming. Enter the content aggregation world and
improving performance by eliminating unnecessary operation becomes more
and more important.

Caching is the act of storing the result of a resource production and
avoid further production during the ergodic period of that resource at
the request time.

The 'ergodic period' of a resource is the time period when the resource
doesn't change. In short, when the result of the cache lookup and the
resource production are exactly the same.

It must be noted that even the most frequently updated resources (i.e. a
page including time of the day) has an ergodic period greater than zero.
Even a page that shows the time of the day with a second granularity,
has an ergodic period of one second.

It must also be noted that caching is *NOT* automatically efficient, as
we will show later on.

Final note: we are discussing resources which are produced using a
"cacheable" pipeline *ONLY*. If the pipeline is not cacheable (means:
it's not entirely composed of cache-aware components) caching
never takes place.

                                - o -

A general caching model for a single cacheable resource production is:

       +-----(no)------------------> production --------------+
       |                                                      |
  -> cache? -(yes)-> valid? -(yes)----------------> lookup ---+-->
                       |                                      |
                       +-----(no)--> production --> storage --+

Thus, given

  r        := resource
  t        := time of the request
  V(r,t)   := validity estimation cost
  P(r,t)   := production cost when caching is not performed
  Pc(r,t)  := production cost when caching is performed
  l(r,t)   := lookup cost
  s(r,t)   := storage cost

and

  I := sequence of invalid requests time [1,n]
  V := sequence of valid request time [1,m]

we have that

                n
                --
  Invalid(r) =  \  (V(r,I ) + Pc(r,I ) + s(r,I ))
                /        i          i         i
                --
               i = 1

and

              m
              --
  Valid(r) =  \  (V(r,I ) + L(r,I ))
              /        j         j
              --
            j = 1

and

             n                m
             --               --
  Ptot(r) =  \  P(r,I )   +   \  P(r,V )
             /       i        /       j
             --               --
           i = 1             j = 1


caching is efficient if and only if


  Invalid(r) + Valid(r) < Ptot(r)      [1]


Let's make some considerations:

1) the dimension of 'cost' has not been specified, this means it can
include everything that is 'scarce' on the system (memory, cpu, network
connections... or even interrupts or multiprocessing degree).

2) all costs are function of the resource *and* time. This takes into
consideration the fact that each cost is not generally unrelated to the
other costs and to the rest of the system. For example, the storage cost
may be higher during load peaks since memory consuption for other
resource generation might force the system to use virtual memory or have
less CPU time.

3) disequation [1] answers the boolean question "should we cache this
resource?".

It is evident that the solution of caching question can be determined
exactly only when all the necessary information are available. This is
the case only when:

  1) all requests have already happened
  2) it's possible to evaluate the single costs independently

Unfortunately, these requirements cannot be satisfied "a priori". This
means that it's virtually impossible to obtain optimal efficiency for a
physical cache system.

Nevertheless, given

  Ctot(r) = Valid(r) + Invalid(r)

the cache efficiency for the given resource is given by

  eff(r) = Ptot(r) - Ctot(r)

which represents a way to determine "a posteriory" the efficiency of the
cache system on the particular resource for a particular set of
requests. This gives us information on how well caching behaves on a
particular resource and can be used to determine whether or not to turn
caching off at a certain point in time.

                                - o -

The goal of a caching system is to be adaptive toward optimal efficiency
and automatically decide whether or not to cache a particular resource
based on his previous efficiency history.

The first result of the above model is that site administrators cannot
decide whether or not a particular resource needs to be cached since
they don't have a way to measure the efficiency of the cache on that
particular resource: they don't have all the necessary information.

So:

       +----------------------------------------------------------+
       | Result #1:                                               |
       |                                                          |
       | To obtain optimal caching efficiency, the system must be |
       | totally adaptive on all cacheable resources.             |
       |                                                          |
       |             which means (in Cocoon terms)                |
       |                                                          |
       | The sitemap should *NOT* contain caching information     |
       | since the caching concern and discrimintation doesn't    |
       | belong to any individual's concern area.                 |
       +----------------------------------------------------------+

                                     - o -

There are three possible ways to generate a resource

  1)  ---> cache? -(yes)-> production --->
  2)  ---> cache? -(yes)-> valid? -(no)--> production --> storage -->
  3)  ---> cache? -(yes)-> valid? -(yes)-> lookup --->

To be able to calculate theorical efficiency, the cost of the three
states must be available at all moments. Since all of them generate the
same exact resource, it would be foolish to fire up all of them per each
request just to have the exact value of the costs and estimate the
ergodic period of that resource.

At this point, it's not possible to go further unless a few property are
assumed on the system. We will concentrate on systems where costs and
ergodic periodicity exhibit time locality.

This means that the cost of a state is presumed equal to the last known
cost for that state and the ergodic periodicity of the resource equals
the average ergodic periodicity exhibited until then.

NOTE: this is very strong property even if at first appears innocent.
It's normal to think at caching being useful under heavy load, thus high
request frequencies, thus high time locality for system resources. Since
we have shown how request frequency doesn't indicate, alone, the
efficiency of a caching scheme, we are reducing us to optimize those
systems where time locality is generally valid. Since no other
assumption is feasible, we'll go forward assuming time locality from
here on, but it must be understood that results from here depend heavily
on this property of the system.

This said, we have the tools to design an algorithm that maximises
caching efficiency.

This can be reduced to the creation of the caching discriminator that
must indicate whether or not caching should be performed.

This discrimination is obviously a function of the caching efficiency
for the resource: if the cache has proven efficient, the probability
that caching will be performed again is directly proportional to that
efficiency. At the same time, the chance of performing caching decreases
if the efficiency lowers.

Given

   c(eff) := chance of performing cache given a cache efficency

some general properties of the function can be observed:

  - c: [-inf,+inf] ---> [ 0,1]

  - c(0) = 0.5

  - lim         c(eff) = 1
    eff-->+inf

  - lim         c(eff) = 0
    eff-->-inf

one possible function is

  c(eff) = 0.5 tanh(k * eff) + 0.5

where

  tanh(x) := hyperbolic tangent of x
  k       := any positive real number

Thus, the discriminating algorithm is:

  - generate a random real value between [ 0,1] ---> n
  - obtain the caching efficiency for the given resource --> eff(r)
  - calculate the chance of caching ---> c(eff(r))
  - perform caching if  n < c(eff(r))

what is left to define is how efficiency is calculated:

  eff = C1 = C2 = C3 = total = cached = 0
  while (true) {
   reset cost evaluation
   discriminate caching
   if caching {
    total++
    eff += C1
    if (valid) {
     perform lookup
     C3 = evaluated cost
     eff -= C3
     cached++
    } else {
     perform production
     perform storage
     C2 = evalutated cost
     eff -= C2
    }
   } else {
    C1 = evaluated cost
    eff += C1
    f = total / cached
    eff -= C2 * (1 - f) + C3 * f
   }
  }

where is evident the use of time locality since Ptot is always
incremented by the same quantity even if the cost evaluation for the
state 1 cannot be performed during state 2 or 3 and viceversa.

                                     - o -

It must be noted how the above algorithms are totally abstract and don't
rely on any particular behavior of the resource generation rather than
cacheability and time locality.

The system is totally automatic, adaptive and stable.

Stability implies that no matter the past resource behavior in terms of
caching efficiency, a change in behavior will *always* influence the
caching system. This is implied by the fact that

  d[c(eff)]
  --------- != 0 for every value of eff
    d[eff]

so no matter what the efficiency, there will always be a change in
caching behavior.

It must be observed that the system "converges" toward optimal caching
efficiency but may never reach it if the behavioral properties of the
resources change faster than the system can adapt.

The parameter 'k' in the definition of c(eff) is the only tunable point
of the system and indicates the slope of the curve for zero efficiency,
thus the speed on how the system converged toward a specific behavior.

Higher values of k should be used when resources rarely change behavior
over time, thus allowing faster convergence. On the other hand, the
system will be slower to adapt to behavioral changes. Lower values of k,
indicate the opposite.

                                     - o -

Ok, I know I've lost many of you at this point and I think the majority
of you left will be pretty disappointed in finding out that the above
doesn't seem anything about caching as you thought about it before.

Let's follow a more historical approach. When I started to design a
cache system, I've tried to come up with a rating model for caching:
each resource has associated a specific "rate" that normally depends on:

  - number of request
  - last requested time
  - size
  - production time

then a "rating function" crunches these numbers and comes up with an a
number associated to this resource. All resources higher than a specific
rate were meaningful for caching, the others were not.

This approach is admittedly smarter than what Cocoon1 does today, but
it's more or less a grown up version of the same concept (yes, Cocoon1
uses a simple rate function to estimate garbage collection of cached
resources).

The real problem relies on the controllability of the system: it's
already hard to come up with a "meaningful" rate function (think about
it: it's extremely hard!), the design of an "adaptive" version is even
worse.

I got lost several times trying to come up with such an 'adaptive rate
function' when I realized that I was looking at the wrong place:
variables such as "last requested time" weren't primary since 'caching
efficiency' is simply 'how much the cache saved us'.

Once caching efficiency is defined, the cache control problem becomes a
maximization effort of a single variable and can be solved much more
easily.

The elegance of the caching model here expressed is represented by the
fact that it's totally agnostic on the 'cost function', which unlike the
'rate function' must estimate only the 'cost' of generating the resource
and not how this will impact the caching efficiency.

So, the cost fuction might be as simple as a measure of the elapsed
time, yet the cache adapts automatically to find a behavior that
converges toward optimal caching efficiency within a short number of
request, yet, it's able to adapt it's behavior if the request costs
change over time.

Moreover, leaving the 'cost estimation' pluggable allows power users to
tune the cache for their specific needs, but without having to mess with
the fedback system that stabilizes the caching behavior (which is
clearly much less obvious to deal with)

                                     - o -

The model has been so far designed around the production of a single
resource.

How does the model change when more than one resource is produced?

It's easy to show that a caching system with infinite memory doesn't
reduce optimal efficiency when multiple resources are handled by the
above caching model. Unfortunately, every physical caching system will
have a finite amount of memory available for cache
storage, this implies an automatic degradation of caching efficiency.

At first, we suppose that the system has enough cache memory to hold the
state information for each resource. This information includes:

  - eff        resource efficiency
  - total      number of requests that underwent caching
  - cached     number of requests that were effectively cached (cached <=
total)
  - C1         cost of production for no caching
  - C2         cost of production for invalid caching (C1 < C2)
  - C3         cost of production for valid caching
  - length     length of cache entry
  - resource   pointer to the cache entry (empty if not present)
  - previous   pointer to the previous resource [eff(current) <
eff(previous)]
  - next       pointer to the next resource [eff(next) < eff(current)]

The collection of resource state composes a linked list ordered by
resource caching efficiency.

Given a finite size for caching storage, we define

  global_cache_efficiency = efficiency_stored / total_efficiency * 100

where

  efficiency_stored = sum of the positive efficiencies
                      of those resources stored in the cache

  total_efficiency = sum of the positive efficiencies
                     of all resources.

The sum is done over positive efficiencies since negative efficiencies
are stored only to provide adaptive information and do not represent the
correct behavior.

A few properties of the global cache efficiency can be observed:

  1) globEff: [0, +inf] -> [0,100]

given

  mem := storage memory for cache

  2) lim         globEff(mem) = 100
     mem -> +inf

  3) lim         globEff(mem) = 0
     mem -> 0

globEff(mem) represents the percentage of the potential cache efficiency
that is possible to take advantage of given the storage capacity.

This has a tremendous impact for site administrators since it gives an
absolute 'metric' to understand how 'userful' is to provide more caching
memory to the system. Other types of metrics can be imagined, such as
providing cache efficiency per megabyte, etc. but the important result
is given by the fact that having *all* the necessary cache information
for all resources, we are able to *forecast* what the caching behavior
can be if parameters such as storage capacity are changed.

It must be noted that such efficiency based forecasting is only an
indicator of behavior since increasing the memory capacity is likely to
increase the caching efficiency, thus reducing globEff() more than
linearly with the amount of memory given. But it still provides a simple
to understand metric for system administrators who can finally have a
way to understand how useful caching has been.

At this point, it is useful to note that eff(r) is the sum of all the
cost saved by cache. In the simple case where cost equals elapsed time,
eff(r) represents the time saved by caching on that particular resource
since the starting of the serving operations. It is possible to design a
metric that indicates how much cost per request has been saved on that
resource, but this requires the addition of another variable which
includes also the count of the total number of requests (both cached and
not cached).

                                     - o -

Another important result of this adaptive caching model is that storage
costs don't need to be specified. This means that memory storage can
happen in both fast or slower memories and the cache adaptive algorithm
will understand automatically if it's still efficient to cache a
resource on a slower memory.

This shows that cache memory can be added as disk storage when RAM is
not available and the cache system will adapt directly, understanding if
caching is still efficient or not depending on the storage costs.

                                     - o -

At this point it's better to sum up a little.

So far we have

  - introduced a general model for caching single resources
  - designed adaptive algorithms to maximise caching efficiency
  - showed the impact of limited storage capacity on global efficiency

but we have not

  - defined the 'cost function'
  - defined the implementation of 'lookup' and 'storing'
  - defined the implementation of 'ergodic validation'

since the general caching model is totally independent on the
implementation of these functional blocks, we have separated the
concerns, creating an independent adaptive caching system that can work
on any implementation.

But to be able to build something that works, we must design how these
functional blocks behave.

                                     - o -

The easiest cost function is "elapsed time": this indicates that the
most expensive resource is computing power and speed is the most
valuable property of functional operation.

A pure-Java implementation is possible with a time granularity of
milliseconds using System.currentTimeMillis(). Smaller granularities
require native hooks either to the system or to the JVM using JNI or the
JVM Native Profiling Interface. For most serving enviornments,
millisecond granularity is sufficient.

Another possible cost function is "time + memory": this calculates as
cost a weighted sum of elapsed time and memory used. This is mainly
useful when memory is limited and expensive (embedded systems) so it
adds to the cost of the production. The cache system will then adapt to
optimize the caching efficiency so that both time and memory are wisely
used. This normally results in slower performance, but wiser memory
consuption. In general, might even turn caching off for all resource if
it's efficient to do so, the beauty of the approach shows here since the
administrator can ignore the problems and let the system adapt.

A pure-Java implementation of such "time + memory" cost function is
almost impossible due to concurrency: there is no way for a particular
thread to know how much memory it has used since the JVM gives
information only about the system free memory and doesn't allow more
granularity than that.

It is possible, on the other hand, to provide native hooks to perform
such thread-based heap use evaluation using the JVM Native Debugging
Interface, even if it might result in slower overall performance
depending on the JVM implementation.

Of course, some system might require very specific cost functions and
for this reason it is suggested to make this pluggable and let each
administrator be able to implement their own.
The only provided cost function is reasonably 'elapsed time'.

                                     - o -

So far we have not touched any cocoon-specific concept: the model is
general enough to be applied to any caching need, in theory, even to CPU
design or digital video decompression.

 >From now on, the results will be Cocoon specific.

NOTE: some of what is described below has been already implemented.

                                     - o -

The act of store looking up is normally associated to hashing: a
resource is mapped to a unique key and this key is used as a hash code
into an hash map to lookup the stored entry associated to it.

This key must be unique in the sense that two different requests that
yields the same key will be treated by the cache system as the exact
same resource. While a good candiate for key at first seems the
requested URI, this is a poor choice if the produced content depends on
other parameters such as user agent, or cookie presence or negotiated
content features, etc..

There are two possible solutions:

  a) include all information available to the producer at request time
into the key

  b) ask the producer to give us the key

The first option reduces effort to the user, but it's very inefficient
since the information necessary to create the key might be very big
(imaging hashing the whole session or context content).

The second option enforces IoC and gives the concern back to the owner
since the producing logic contains also the information about what
parameter influences the uniqueness of a resource. For example, if the
production is function of the URI and User-agent I will hash those only
and ignore the rest of the enviornment information at request time.

       +----------------------------------------------------------+
       | Result #2:                                               |
       |                                                          |
       | Each cacheable producer must generate the unique key of  |
       | the resource given all the enviornment information at    |
       | request time                                             |
       |                                                          |
       |   long generateKey(Enviornment e);                       |
       +----------------------------------------------------------+

Since this method is called synchronously it must be as fast as
possible.

64 bits (long) guarantee uniqueness even for very big sites.

                                     - o -

Once the key is obtained, the cache state information is queried and
caching discrimination is performed using the algorithms described
before.

If caching is performed and the resource entry is present in the cache,
ergodic estimation is performed by calling a method the cacheable
producer must implement.

       +----------------------------------------------------------+
       | Result #3:                                               |
       |                                                          |
       | Each cacheable producer must implement a method that     |
       | evaluates ergodicity given the enviornment information   |
       |                                                          |
       |   boolean stillValid(Enviornment e);                     |
       +----------------------------------------------------------+

This implements synchronous ergodic validation, but in some cases the
producer is able to estimate the ergodic period in advance, allowing the
cache system to avoid this call.

       +----------------------------------------------------------+
       | Result #4:                                               |
       |                                                          |
       | Each cacheable producer must implement a method that     |
       | estimates the ergodicity period given the enviornment    |
       | information                                              |
       |                                                          |
       |   long timeToLive(Enviornment e);                        |
       |                                                          |
       | zero indicates the producer is not able to perform       |
       | the estimation and synchronous ergodic validation must   |
       | performed.                                               |
       +----------------------------------------------------------+

The algorithm for performing ergodic validation is (in pseudo-code):

  if (resource is cached) {
    if (current time < expiration time) {
      get resource from cache
    } else {
      get time2live from producer
      if (time2live != 0) {
        set expiration time = current time + time2live
        produce resource and store it
      } else {
        if (resource is still valid) {
          get resource from cache
        } else {
          produce resource and store it
        }
      }
    }
  } else {
    produce resource and store it
  }

which is totally abstracted on the implementation of the actions

  - "get resource from cache"
  - "produce resource and store it"

                                     - o -

Assuming that memory represents an ordered collection of randomly
accessible bytes, the act of 'storing a resource' and 'getting a
resource' imply the act of 'serialization' and 'deserialization' of the
resource.

In many cases these operations are implicit, or done by the system
(think about Java object serialization capabilities). For Cocoon, the
resources are represented as a stream of SAX events and the operation of
serialization/deserialization must be carefully designed.

A possible choice for such serialization/deserialization action is the
XML compilation method I designed a while back.

It must be noted that global caching efficiency can also be used to
measure the impact of SAX compression on caching. This gives another
degree of freedom for the cache administrator to tune the system since
increasing SAX compression during serialization might slow down the
resource generation, but might save cache memory that might be used to
increase the efficiency of other resources.

The impact of de/serialization costs on resources cannot be judged
independently from the statistical properties of the system, just like
all other costs.

                                     - o -

So far we have analyzed a system where resources where produced by a
single actor, now we concentrate on understanding the impact on what
already designed for resources that are generated from a pipeline.

Let us suppose we have a pipeline

   G ---> T1 ---> T2 --->

where the components are

  G    := generator
  T1   := first transformer
  T2   := second transformer
  ---> := SAX stream

The question is: is it possible to perform ergodic validation on the
resulting pipeline given the ergodic information for the single
components?

Before answering the question, let us observe that given a transformer

  a ---> T ---> b

the result of the production b is can be written as

  b = T(a(t),t)

assuming that the behavior of the function doesn't change with time
(this would require code recompilation!) and separating the changeable
parts, the above can be simplified in

  a ---> T ---> b
         ^
         |
         c

  b = T(a(t),c(a(t),t))

performing an ergodic validation on such a general 'T' requires the
presence of both a(t) and c(t) and normally results in an operation that
is algorithmically equivalent to that of generate the resource b(t), it
would then be neaningless to perform since would make caching efficiency
automatically negative.

Let us note that, if dc/da = 0 we can write that c(t) doesn't depend on
a(t) so we can write

  b = T(a(t),c(t))

this means that, ergodic validation can be performed on c(t) only and
whether c(t) is ergodic, b(t) has the same ergodic properties of a(t).

The importance of this must not be underestimated: in a complex
pipeline, the ability to "atomize" ergodic validation is vital for the
efficiency of the cache. When such a separation is not possible, caching
becomes almost automatically inefficient since the cost of performing
ergodic validation is confrontable with that of generating the request
anew.

       +----------------------------------------------------------+
       | Result #5:                                               |
       |                                                          |
       | Ergodic validation of a pipeline can be atomized if and  |
       | only if the behavior of the component c(t) does not      |
       | depend on the input a(t)                                 |
       +----------------------------------------------------------+

Let's make a few examples:

- for XSLT transformation, a(t) represents the document to transform and
c(t) represents the stylesheet, thus the instructions to the T()
function which is the XSLT behavior that never changes during execution.
Following the rule, the validation is atomic when the stylesheet doesn't
depend on the input. This is true only if no <?xml-stylesheet?> PI is
present in the input.

       +----------------------------------------------------------+
       | Result #6:                                               |
       |                                                          |
       | The use of PI to drive the operation of a pipeline       |
       | breaks, along with SoC, pipeline atomiticy and thus must |
       | be considered harmful.                                      |
       +----------------------------------------------------------+

- for XInclude transformation, c(t) is *always* and by definition
function of a(t) since it provides the information that c(t) will use to
get the external information to include. XInclusion performed as a
transformer can never be atomized.

       +----------------------------------------------------------+
       | Result #6:                                               |
       |                                                          |
       | All those transformers that treat the input as           |
       | for their operation cannot be atomized.                  |
       +----------------------------------------------------------+

This gives us a metric to understand why it's preferable to move those
operations at generation rather than provide XML parameters (This is the
case with the SQLProcessor and LDAPProcessor).

It must be noted that lack of atomicity doesn't necessarely imply "bad
performance" of the pipeline, but rather decreased optimal cache
efficiency. The difference must not be misunderstood.

Let's come back to our pipeline, assuming all our components exhibit
ergodic atomicity, the algorithm to drive the general pipeline:

   G -(r [1] )-> T [1] -(r [2] )-> ... -(r[n])-> T[n] -(r[n+1])->

is

  if (G is not valid) {
    start from G
  } else {
    for (i = 1; i <= n; i++) {
      if (T[i] is not valid) {
        for (j = i; j > 0; j--) {
          if (r[j] is cached) {
            send r[j] thru T[j]
            stop
          }
        }
      }
    }
    r[n+1] is valid
  }

                                     - o -

We have seen how pipeline caching performs optimally only if the
components are atomic and we showed that XInclusion is not. Since we
indicated how XInclusion is vital for content aggregation, we now try to
evaluate the ability to perform XInclusion without breaking pipeline
atomicity.

The first think to observe is that XInclude is, by nature, intrinsically
tied to XML, just like namespaces. In the context of XML publishing,
it's very limiting to consider one without the other.

This seems to imply that XInclusion should be a feature of the system
and not a detachable component.

Let us suppose we included XInclude capabilities right into the SAX pipe
that transports SAX events from one component to the next. This "pipe
connector" must be present because it's the actor tha performs SAX
serialization and deserialization when the resource is stored in or
retrieved from cache.

The regular 'connector' could be drawn as such

   T ------\-/-----> T
           | ^
           | |
           C D
           | |
           V |
          cache

where

  T := transformer
  C := event compiler
  D := event decompiler

by placing internal xinclude capabilities we transform it into something
like

   T ------\-/---[X]--> T
           | ^    ^
           | |    |
           C D    v
           | |   world
           v |
          cache

where

  world := all possible included resources.

Unfortunately, something like this would be totally equivalent of having
an XInclude transformer, thus atomicity will be broken again. The
solution is making the XInclude component cache aware (unlike all other
components which are yes cacheable, but totally cache agnostic and
cannot call the cache directly but enforce IoC).

So

   T ------\-/---[X]--> T
           | ^    ^
           | |    |
           C D    |
           | |    |
           v |    v
         +----------+
         |   cache  |
         +----------+

It's good to note that xincluded resources can be

  - internal:  equivalent of subpipeline mounting
  - external:  equivalent of server side inclusion

For internal resources, the cache is able to perform subrequesting thru
the sitemap and caching the internal resource is a feature we get for
free.

On the other hand, for external resource, the cache doesn't have a
"cacheable" actor to call to estimate ergodicity. I see two possible
solutions:

  - wrap all external resources with a generator, thus make all of them
internal
  - assign a cacheable actor to each external resource based on the
protocol

The first is more uniform and elegant, but requires to create sitemap
entries for each included resource (which is a rather good practice,
IMO, even if places more work on the sitemap administration)

The second is more 'hacky', but doesn't need the definition of a sitemap
entry for included resources.

Since I believe that the sitemap should contain information about all
the resources, even those included to allow the administrators to
control what's going on, the first solutions seems the most elegant so

       +----------------------------------------------------------+
       | Result #7:                                               |
       |                                                          |
       | Xinclusion works on local URI since all the external     |
       | resources must be wrapped by a caching actor and defined |
       | in the sitemap.                                          |
       +----------------------------------------------------------+

                                     - o -

Now that we have found a way to allow xinclusion and maintain component
ergodic atomicity, let's see how the caching algorithm is modified.

Given

  G2 -(r1.1)-+
             |
             v
  G1 --(r1)--+-> T -(r2)->

we immediately note that since (r1) includes the information about
(r1.1), the resource cache state must also contain data about what
resources are included. This information must be updated during
xinclusion time by the XInclude cache-aware filter during production.

So, if (r1) is valid because G1 says so, the cache information of (r1)
is retrieved from cache, and all the included resources are validated.
If all of them are valid and T is valid, then (r2) is valid. Otherwise,
even just one of the included resources is invalid, (r1) is passed again
into the xinclude filter where all the cached resources will be included
right out of cache, while those invalid ones will be regenerated.

       +----------------------------------------------------------+
       | Result #8:                                               |
       |                                                          |
       | Each resource must have associated some cache            |
       | information that contains the resources that are         |
       | included.                                                |
       +----------------------------------------------------------+

                                     - o -

So far we have treated the pipelines as they were composed only by
generators and transformers. In short, each pipeline can be seen as a
SAX producer. This is a slight difference from the original C2 term of
pipeline that included a serializer as well, but it has been shown how
the addition of xinclusion requires the creation of two different terms
to define pipelines that are used only internally for inclusion or those
pipelines that "get out" and must therefore be serialized.

I have the feeling that this requires some sitemap semantics
modification, allowing administrators to clearly separate those
resources who are visible from the outside (thus require a serializer)
and those who are internal only (thus don't require a serializer).

In general, some resource can be used both internally and externally.
This can be done by wrapping an internal resource with a serializer and
present it to the external world under a different URI.

The only thing that is left to design is how serialized content is
cached.

Generally speaking, an externally-visible pipeline such as

  G ---> T ---> S --->

could be cached just like an internal one as long as no SAX compilation
takes place after the serializer. The algorithms for ergodicity
validation remain the same.

But serializers have particular behavioral properties that allow us to
obtain a general result. Unlike transformers, serializers have a
functional behavior of

  a ---> S ---> b

where

  b = S(a(t),t)

but unlike transformers, serializers must not have external instructions
but rather be code implementation of 'adapting' functions between the
XML world and the outside octet-stream-based world. Thus, since time
dependency here means code recompilation and all the instructions are
contained into code we can assume that

  b = S(a(t))

where b maintains the exact same ergodic properties of a.

This means that caching both a and b is useless since if b is valid so
is a, and if a is invalid both a and b must be invalidated. This shows
that the cache entry for 'a' is *never* used!

It is a general rule to say that for a complete pipeline

  G -r [1] -> T [1] -r [2] -> ... -r[n]-> T[n] -r[n+1]-> S -r[n+2]->

r[n+1] must be stored if and only if the pipeline (G,T1...Tn) is used as
an internal included resource, otherwise it's optimally efficient to
discard it alltogether.

                                     - o -

We have reached the point where the Cocoon cache system is complete in
functionality, content aggregation aware, totally adaptive on memory
consumption and efficienty maximization.

A final note must be made: the request that is most efficiently
processed is the one that is not processed at all!

The HTTP/1.1 specification [RFC 2616] makes a great effort to design a
way to allow a distributed information system to improve performance
thru the use of caching. At this point, it must be noted that even if
both this note and the HTTP spec makes use of the term caching, they
mean different things: we refer to caching of internal resources while
HTTP caching works on transparently performe serving on behalf of
another server, thus reducing load on the server.

Unfortunately, total transparency cannot be performed because the proxy
and the server don't share the same concerns and don't have the same
information on the requested resource. In fact, only the server is able
to create the resource: the proxy can only decide whether or not the
local copy that it might have previously store is 'meaningful' enough
for the client to avoid making the server request and returning the
local copy.

This implies that a good proxying behavior is dependent on server's
'proxy-friendlyness', which assumes that the server provides the proxy
enough information to perform its operation well. Otherwise (and
unfortunately, this is almost always the case!!) proxies simply work as
gateways that are rarely able to cache information.

So, let's look at the behavior that HTTP/1.1 reccomends for
proxy-friendly servers by starting to note that, just like our caching
model, there are two ways of performing ergodic estimation:

  - asynchronous, thru the use of expiration time or maximum age
  - synchronous, thru the use of validation-only requests

Being friendly for asynchronous operation means simply to provide
maximum age information with the header

  Cache-Control: max-age=<delta seconds>

whenever possible.

For Cocoon operation, this means that this header will be automatically
sent if and only if all resources that generate the response are
cacheable and provide a maximum age which is higher than zero.

Of course, the max-age will be the minimum of the many maximum ages of
the resources that form the response.

It must be noted that normal operations like XSLT transformation cannot
provide a maximum age because there is no information on when the
stylesheet can be changed. On the other hand, it's not normally harmful
to have old stylesheets, so it's up to the administrator to tune the
caching system for their needs.

HTTP/1.1 also provides a mean for validating a resource that is cached
in the proxy, much like our caching system does when synchronously calls
all cacheable components. This method works by sending a normal request
to the server, along with some special "validator" headers that were
previously sent by the server during the response generation.

The server can use these 'validator' headers to gain knowledge about the
entry that is located in the proxy and perform local processing to
estimate the validation of that particular cached response. If the
server rules estimate that the proxied response is still valid, it
returns an empty response with a 304 (Not Modified) status code,
otherwise it returns the full response along with the updated
'validator' headers.

[please, read section 13 of the RFC 2616 for more info]

The validator header that is normally used is "Last-Modified" but any
request header or combination of headers can be used to gain information
on the validity of the request. It is thus possible, when a request
comes and the cocoon cache system returns that the stored request is
still valid, to return a 304 instead of sending the full body (even if
from cache).

This reduced network congestion and increases caching efficiency since
the expense of sending the bytes to the client (even if cached) can be
avoided.

As a general consequence, this implies that storing the result of
serialization can always be avoided if Cocoon is wrapped by an HTTP/1.1
proxy that is able to perform response validation.

It is possible to obtain information about 'proxying' using the 'Via:'
HTTP header that (should!) indicate the proxy chain. Unfortunately,
there is no way to automatically tell if Cocoon is completely wrapped by
a proxy or if proxying is performed down the road. It is therefore
advisable that Cocoon implement a caching behavior for serialized
responses depending on the presence of a proxy wrapper (storing or not
the serialized response) which can be indicated only by using a
configuration directive.

Note: Apache's mod_proxy distributed with the 1.3.xx series implements a
HTTP/1.0 proxy and for this reason it's not able to perform response
validation as described here. On the other hand, Squid is partially
compliant (can any proxy pretend serious HTTP/1.1 compliance today?) and
might be able to perform such proxying (sorry, haven't tested it yet!).
  I don't know about Apache 2.0.xx mod_proxy.

Of course, proxying can be cascaded, so a local transparent proxy
doesn't impact on the performance of remote proxies and doesn't impact
on proxy-friendlyness at all.

                                     - o -

Final note
----------

All the results expressed in this RT are 'indicating' and do not
represent anything buy my personal research and invention. This does
*NOT* imply that any of the results find in this note have to be
implemented in Cocoon automatically, not at all! I've lost my status of
active developer for this project and I'm therefore 'suggesting' some
possible solutions to some problems: in case better ideas or results
emerge from the community, I'm more than willing to adapt results based
on that input. In fact, I'm sending this exactly to start a healthy
discussion.

As for algorithms and pseudo-code, the above expresses a visual mean to
transfer information about the logic, and *DOES NOT* want to be
exhaustive or conclusive in any way, rather a small indication of the
direction to take.

As for java interfaces and methods, they must be considered suggestions
only and nothing more than that. If interfaces were already implemented
that contain the same concepts or part of them, I'm more than willing to
adapt to anything that was previously designed as long as it doesn't
break the overall model and concepts express herein.

Awaiting for your comments, I thank you all for the patience :)

--
Stefano.


Mime
View raw message