cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Berin Loritsch" <blorit...@apache.org>
Subject RE: [Design] ContainerManager is under fire--let's find the best resolution
Date Thu, 06 Jun 2002 17:56:02 GMT
> From: Leo Sutic [mailto:leo.sutic@inspireinfrastructure.com] 
> 
> > From: Berin Loritsch [mailto:bloritsch@apache.org]
> 
> > > Assume you have a CM that automatically reclaims all components 
> > > after each request. That is, for Cocoon, when the request 
> comes in, 
> > > the CM starts keeping track of what components have been 
> taken out, 
> > > and when the request has been processed, they are release()'d (or 
> > > equivalent method).
> > > 
> > > Now introduce pooled components.
> > > 
> > >     If more than pool-max components are looked-up during 
> > >     the request you are not performing well, as you empty
> > >     the pool.
> > 
> > I thought I already did introduce pooled components.  It's
> > really simple.  The GC process for components releases 
> > them--just like we currently do. The GC process is done 
> > after the Response is committed.
> 
> The scenario was when more than pool-max lookups had been
> done before the GC kicks in. Suppose you have a pool-max of 3:
> 
>    public void handleRequest () {
>       someMethod ();
>       someMethod ();      
>       someMethod ();
>       someMethod ();
>    }

And this is different from the current state of affairs, how?
If a request requires 5 transformer instances, and you have your
pool max set to 3, you will still experience slow down.  This
is no different than automatically releasing a component when
the request is handled.


>    public void someMethod () {
>       manager.lookup (ExpensiveButPooledComponent.ROLE);
>       ...
>    }
> 
> With an explicit release() this could be made not to drain 
> the pool. With GC you can not, unless you set the timeout 
> ridiculously low.

With an explicit release() you are in the same boat as the GC
method.  For Cocoon we have a really simple lifelength for
requested components: the length of a request.  It's not that
hard to implement or to comprehend.  It is also pretty easy
to manage the instances available.

Many of the components that are currently pooled can be made
into a PerThread policy.  All we need is a ThreadLocal variable
to create the instance of the object.  This accounts for a
large majority.  Unfortunately, the core components in Cocoon
have an interface that is not friendly, and we need a unique
instance for every request.  AKA pooling.

I am also advocating that the current pipeline component interfaces
be changed.  The Generator, Transformer, and Serializer implement
SAX methods--which is mixing concerns.  They should return one.

Now, we can set it up so that we can have a new version of
the interface without breaking backwards compatibility with
current components--but that is a subject for another thread.

There is something inherently wrong when the only option available
to you is to pool the components or create them new every time.
The interface is wrong.  It adds overhead and long and drawn out
witch hunts finding where the component references are leaking.

If we can design the components so that they can either be shared
among all threads (optimal), or at the very least ensure that one
instance per thread is sufficient then we have something where
the framework is no longer the issue and we no longer need the
release() mechanism.

The issues come with the forcing of Poolable.  That decision should
be something that the container can decide to implement if it wants
to--possibly to save instances so that the number of instances of
a component are fewer than the number of threads.

However, the interfaces for the Cocoon pipeline components are broken.
A Generator should return an XMLSource, a Transformer should return
an interface that merges XMLSource and ContentHandler, and a Serializer
should return a ContentHandler.

That way we can have something as simple as


XMLSource source = generator.getXMLSource("file", uri);
XMLSource trans = source;
Iterator xformers = transformers.iterator();

while ( xformers.hasNext() )
{
    Struct entry = (XMLSource)xformers.next();
    XMLSource newTrans = transformers.getPipeline(newTrans.type,
newTrans.uri)
    trans.setContentHandler(newTrans);
    trans = newTrans;
}

trans.setContentHandler( serializer.getHandler("svg2png") );

source.execute();

As the ContentHandler.endDocument() is called on each item, they are
automatically returned to their pools.  Its not bad.  Not to mention,
the current style generators, transformers, and serializers whould
be able to be used as the return values--so that everyone's hard work
is not wasted.


> > The GC routine for the container collects any components that
> > need to be reclaimed into the pool.  As a result we will have 
> > fewer dangling components than is currently possible.  Right 
> > now, we have the equivalent of C++ memory allocation.  The 
> > onus is on the developer to get it right.  The GC brings the 
> > component into the Java age where GC is the norm.  You don't 
> > have to worry about deleting everything you new in Java, the 
> > user doesn't have to worry about releasing everything you lookup.
> 
> Well that's fine in theory, but in practice you will end up tweaking 
> and tweaking your GC timeouts and pool sizes, getting bizarre 
> errors along the way.

You already have to skrew with pool sizes.  The GC element is not
going to make things less predictable on that front.  In fact, it
is a good possibility to make it *more* predictable.

As to timeouts, we can use one policy for the container type.  For
example, Cocoon would benefit from a request based approach.  Other
containers may have to use a timeout based approach.  Its up to the
container.  Are timeouts sufficient?  No.  Does it add additional
complexity for the container? Yes.  Does it help the developer?
absolutely.


> > Example:
> > 
> > Proxy that releases the component instance after a timeout of
> > 100 ms will wait as a container of nothing until it is either 
> > GC'd by the JVM or until an interface method has been called. 
> > In that case, the call blocks until a new Component instance 
> > is pulled from the pool.  The method is then called.
> 
> But component state is lost in the "refresh". Meaning that for 
> a SAX transformer or *any other component with state* you have 
> screwed up the processing. (So don't allow components with 
> state, then - well, then they are all ThreadSafe and we do not need 
> pools.)

See above.  The Cocoon pipeline component interfaces are really
screwed up in this respect.  A component's state should be sufficient
per thread.  Anything that is more granular than that needs a
different treatment.


> The basis of GC is that you can unambiguously tell when an 
> object is no longer used - when it can not possibly be used. 
> The speedups we have in pooling is due to explicitly telling 
> the container that this object can be reclaimed, thus keeping 
> the object count low.

In Cocoon we have the advantage of knowing that.  A pipeline
component cannot possibly be used past the processing of a request.
It makes for a really simple GC mechanism.


> > I do not want any more work on the client.  Let the container
> > be smart and the client be dumb.
> 
> Agreed! But what you propose is simply too complex to ever 
> work in practice. There are just too many restrictions on how 
> a component may behave, too many parameters for the GC 
> policy. Too much that can go wrong.

I am finding more and more what people are calling components are
nothing more than Objects that observe the Bridge pattern.  They
implement an interface, introduce a few lifecycle methods, etc.
If they are object then they should be treated as such.  If a
pooled object requires an explicit return to a pool, than that
decision should be made in the GeneratorManager, or the
TransformerManager, etc.  Not in the core lookup mechanism.


> Add in different GC policies for different containers and
> you end up with making the whole thing more complex instead of 
> less.
> 
> Summary: GC of components...
> 
>      ...means that components may not have state and be pooled.

It means that the state has to be at least consistent within a thread.
Of course, the proxy can maintain the state as well--but that is more
complexity as well....

>      ...means that you always risk draining the pool.

That is the notion that I am trying to dispell.  It means that there
is fewer instances of memory leaks in Cocoon because what one developer
forgot to release is not going to hurt everyone else.

>      ...means a load of GC policy parameters for the client.

? I don't get this at all.

GC policy is a function of the container--the client has no say in
its use.  The JVM does not have programmatic hooks to allow you to
modify at runtime what GC policy it has.  The fact that it has a
System.gc() method is too much IMO to give to a client.


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message