cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Berin Loritsch" <blorit...@apache.org>
Subject RE: [Design] ContainerManager is under fire--let's find the best resolution
Date Fri, 07 Jun 2002 13:08:14 GMT
> From: Vadim Gritsenko [mailto:vadim.gritsenko@verizon.net] 
> 
> > From: Berin Loritsch [mailto:bloritsch@apache.org]
> 
> <snip-a-lot/>
> 
> ASSUMPTION:
> 
>   Poolable component is a component with high instantiation 
> cost and state thus it can not be used in several threads 
> simultaneously.
> 

Don't forget Per-Thread policy (not in ECM, but in new Fortress
package).  That instantiates one instance of a component per
thread, and you don't need pooling semantics to do it.

So that assumption is too broad.

Poolable components are for components that must be unique to
each lookup.  It is these types of components that should be
changed.  The Transformer is a perfect example of that.


> > However, the interfaces for the Cocoon pipeline components 
> are broken. 
> > A Generator should return an XMLSource, a Transformer 
> should return an 
> > interface that merges XMLSource and ContentHandler, and a
> Serializer
> > should return a ContentHandler.
> 
> Right now Transformers are poolable. They have a state and they are
> (supposedly) heavy to new().

Some of them are.  Their heaviness comes from the lifecycle they
must go through before they are ready to be used.  Some trivial
Transformers that do not need context information, or to lookup
other components, or to be configured are better off new()ing
every time.


> If you to change Transformer interface to return only 
> XMLSource/ContentHandler, all the logic and state Transformer 
> has moves into this XMLSource.

The state information moves into an artifact of the runtime
system.  This is as it should be.  We can query the component
for a unique instance of the XMLPipeline (merging of XMLSource
and ContentHander)--opening the door for other types of performance
enhancing opportunities.  Once the XSLT transformer has generated
the template, it can use a cached version of it--and the logic
makes sense.  Consider this use case:

--------Current State----------

generator.setup(....); // finds out the source info, etc...
transformer.setup(....); // finds out the source info, etc...
serializer.setup(....);

transformer.setContentHandler( serializer );
generator.setContentHandler( transformer );
generator.execute();

-------New Way-----------

XMLSource source = generator.getSource( type, .... ); // can cache at
this point
XMLPipeline pipe = transformer.getPipeline ( type, .... ); // can cache
at this point
ContentHandler sink = serializer.getSink( type, .... ); // can cache at
this point..if necessary

source.setContentHandler( pipe );
pipe.setConentHandler( sink );
source.execute();

-------------------------

It also helps in assembling the pipeline dynamically, with fewer
lookups.

The fact that we work with fairly generic types allows us the ability
to take advantage of generative programming such as using BCEL to
generate
a class that spits out SAX events (kind of like XSP but better)--and
have that done by the caching system.  The Generator component's
responsibility
then becomes how to manage these artifacts rather than how to actually
do the work.

The new way would probably add a GeneratorManager for this purpose.
However,
the artifact returned is preinitialized with everything it needs.  The
GeneratorManager, TransformerManager, and SerializerManager can all take
care
of usage semantics if it handles pooled items.

Otherwise stated, it would be *more* correct to return artifacts to a
specific
manager than it would be to return it to a lookup mechanism.  What we
want to
restore is the separation of concerns for the lookup mechanism.  The CM
was
only designed to be a lookup mechanism--not a container.

> 
> Thus, XMLSource becomes heavy and Transformer light. 
> Obviously, Transformer becomes ThreadSafe (which is good) and 
> XMLSource must be made Poolable (its heavy, it is stateful).


Not necessarily--there are other possibilities for optimization
at a systemic level that would not otherwise present itself.


> Instead of having one component we ended up with two. Please 
> tell me I see things wrongly.
> 
> <snip what="simple pipeline"/>

You end up with one management component, and artifacts it returns.
Those artifacts can be cached results, compiled XML streams, or
C2 Generators, etc.  We are no longer limited by our architecture.
We can have more intelligent operations on the pipeline components.



> > As the ContentHandler.endDocument() is called on each item, 
> they are 
> > automatically returned to their pools.
> 
> Two issues on this one:
> 
> 1. endDocument might be never be called. I can discard 
> component after evaluating its cache ID or cache validity.
> 
> 2. endDocument does not necessarily indicates that I'm done 
> with this component. Simple example: you are using serializer 
> to serialize xml fragment 100 times. It would be logical to 
> make a loop:
> 
> serialier = lookup();
> for(;;){
>   serializer.setDestination();
>   serializer.startDocument();
>   ...
>   serializer.endDocument();
> }
> 

Wrong application.  It is a Transformers job to modify the XML
so that you have an XML fragment repeating 100 times.  The
Serializer should only opperate on the XML given to it.

A serializer should _*never*_ modify the content of the XML.  It
can only modify the binary stream's representation of it.


> > As to timeouts, we can use one policy for the container type.  For 
> > example, Cocoon would benefit from a request based approach.
> 
> What if processing continues after sending response?
> I.e., after endDocument() on serializer, some work is done in 
> transformer? Like invoking other serializer?

Then you have broken Cocoon's design.  A Transformer does not invoke
serializer.  Ever.  It is the Sitemap's responsibility to manage all
pipelines--whether they have branched or not.  Once all processing for
a request is done--and the sitemap or at least the Cocoon container
knows this unequivicably--then it can reclaim the components.


> > Other
> > containers may have to use a timeout based approach.  Its up to the 
> > container.  Are timeouts sufficient?  No.  Does it add additional 
> > complexity for the container? Yes.  Does it help the developer? 
> > absolutely.
> 
> There are situations when transaction takes hours to process 
> (I do not mean DB transaction here). How this will happen?

Wow.  Hours?  Then you need to think of a different way of handling that
transaction.  That is a deeper design issue that needs serious thought
for that application.


> > > But component state is lost in the "refresh". Meaning 
> that for a SAX 
> > > transformer or *any other component with state* you have 
> screwed up 
> > > the processing. (So don't allow components with state, 
> then - well, 
> > > then they are all ThreadSafe and we do not need
> > > pools.)
> > 
> > See above.  The Cocoon pipeline component interfaces are really 
> > screwed up in this respect.  A component's state should be 
> sufficient 
> > per thread.
> 
> Thread can require several components of the same type to do 
> its work. How this will be handled?

Use the ***Manager approach above.  If you need a unique instance of
a component for each lookup, then there is probably something wrong
in your design.



> > Anything that is more granular than that needs a
> > different treatment.
> 
> What could it be?

A **Manager approach outlined above.


> > > The basis of GC is that you can unambiguously tell when 
> an object is 
> > > no longer used - when it can not possibly be used. The 
> speedups we 
> > > have in pooling is due to explicitly telling the 
> container that this 
> > > object can be reclaimed, thus keeping the object count low.
> > 
> > In Cocoon we have the advantage of knowing that.  A 
> pipeline component 
> > cannot possibly be used past the processing of a request.
> 
> Some transformers use instance of serializers to do its work. 
> It could be looked up on startup and returned on shutdown (to 
> speedup processing
> - right now manager.release() is quite expensive operation), 
> and will not depend on request/response cycles.

:) And now you are getting why we need to design our components
so that we do not need to release() them.  BTW, The Fortress
container has a much shorter release() cycle because it handles
the logic asyncronously.  It may take a little longer getting the
instance into the pool, but it doesn't affect the critical path.

However, if a Transformer directly uses a Serializer then something
is wrong.  That was never the intention of the Cocoon component
model.


---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message