cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Vadim Gritsenko" <vadim.gritse...@verizon.net>
Subject RE: [Design] ContainerManager is under fire--let's find the best resolution
Date Fri, 07 Jun 2002 14:42:07 GMT
> From: Berin Loritsch [mailto:bloritsch@apache.org]
> 
> > From: Vadim Gritsenko [mailto:vadim.gritsenko@verizon.net]
> >
> > > From: Berin Loritsch [mailto:bloritsch@apache.org]
> >
> > <snip-a-lot/>
> >
> > ASSUMPTION:
> >
> >   Poolable component is a component with high instantiation
> > cost and state thus it can not be used in several threads
> > simultaneously.
> >
> 
> Don't forget Per-Thread policy (not in ECM, but in new Fortress
> package).  That instantiates one instance of a component per
> thread, and you don't need pooling semantics to do it.

Mmm... This can help in some situations...


> So that assumption is too broad.
> 
> Poolable components are for components that must be unique to
> each lookup.  It is these types of components that should be
> changed.  The Transformer is a perfect example of that.

If we to remove ContentHandler interface from Transformer it will become
ThreadSafe, so Per-Thread policy won't be used here.


> > > However, the interfaces for the Cocoon pipeline components
> > > are broken.
> > > A Generator should return an XMLSource, a Transformer
> > > should return an
> > > interface that merges XMLSource and ContentHandler, and a
> > > Serializer should return a ContentHandler.
> >
> > Right now Transformers are poolable. They have a state and they are
> > (supposedly) heavy to new().
> 
> Some of them are.  Their heaviness comes from the lifecycle they
> must go through before they are ready to be used.  Some trivial
> Transformers that do not need context information, or to lookup
> other components, or to be configured are better off new()ing
> every time.

Ok.


> > If you to change Transformer interface to return only
> > XMLSource/ContentHandler, all the logic and state Transformer
> > has moves into this XMLSource.
> 
> The state information moves into an artifact of the runtime
> system.  This is as it should be.  We can query the component
> for a unique instance of the XMLPipeline (merging of XMLSource
> and ContentHander)--opening the door for other types of performance
> enhancing opportunities.  Once the XSLT transformer has generated
> the template, it can use a cached version of it--and the logic
> makes sense.  Consider this use case:
> 
> --------Current State----------
> 
> generator.setup(....); // finds out the source info, etc...
> transformer.setup(....); // finds out the source info, etc...
> serializer.setup(....);
> 
> transformer.setContentHandler( serializer );
> generator.setContentHandler( transformer );
> generator.execute();
> 
> -------New Way-----------
> 
> XMLSource source = generator.getSource( type, .... ); // can cache at
> this point
> XMLPipeline pipe = transformer.getPipeline ( type, .... ); // can
cache
> at this point
> ContentHandler sink = serializer.getSink( type, .... ); // can cache
at
> this point..if necessary
> 
> source.setContentHandler( pipe );
> pipe.setConentHandler( sink );
> source.execute();
> 
> -------------------------
> 
> It also helps in assembling the pipeline dynamically, with fewer
> lookups.
> 
> The fact that we work with fairly generic types allows us the ability
> to take advantage of generative programming such as using BCEL to
> generate
> a class that spits out SAX events (kind of like XSP but better)--and
> have that done by the caching system.  The Generator component's
> responsibility
> then becomes how to manage these artifacts rather than how to actually
> do the work.

And work itself will be done in XMLSource/XMLPipeline/XMLSink. I got
that part. Hope performance will not be sacrificed by this move (you
will be new()ing this objects all the time)


> The new way would probably add a GeneratorManager for this purpose.
> However,
> the artifact returned is preinitialized with everything it needs.  The
> GeneratorManager, TransformerManager, and SerializerManager can all
take
> care
> of usage semantics if it handles pooled items.

How they differ from ComponentSelector?


> Otherwise stated, it would be *more* correct to return artifacts to a
> specific
> manager than it would be to return it to a lookup mechanism.  What we
> want to
> restore is the separation of concerns for the lookup mechanism.  The
CM
> was
> only designed to be a lookup mechanism--not a container.
> 
> >
> > Thus, XMLSource becomes heavy and Transformer light.
> > Obviously, Transformer becomes ThreadSafe (which is good) and
> > XMLSource must be made Poolable (its heavy, it is stateful).
> 
> 
> Not necessarily--there are other possibilities for optimization
> at a systemic level that would not otherwise present itself.
> 
> 
> > Instead of having one component we ended up with two. Please
> > tell me I see things wrongly.
> >
> > <snip what="simple pipeline"/>
> 
> You end up with one management component, and artifacts it returns.
> Those artifacts can be cached results, compiled XML streams, or
> C2 Generators, etc.  We are no longer limited by our architecture.
> We can have more intelligent operations on the pipeline components.

I think discussion here was carried away from topic... Architecture of
future Cocoon should be discussed separately. I was trying only to
clarify how container will handle absence of the release() method.


> > > As the ContentHandler.endDocument() is called on each item,
> > > they are automatically returned to their pools.
> >
> > Two issues on this one:
> >
> > 1. endDocument might be never be called. I can discard
> > component after evaluating its cache ID or cache validity.
> >
> > 2. endDocument does not necessarily indicates that I'm done
> > with this component.

What about these points?


> > Simple example: you are using serializer
> > to serialize xml fragment 100 times. It would be logical to
> > make a loop:
> >
> > serialier = lookup();
> > for(;;){
> >   serializer.setDestination();
> >   serializer.startDocument();
> >   ...
> >   serializer.endDocument();
> > }
> >
> 
> Wrong application.  It is a Transformers job to modify the XML
> so that you have an XML fragment repeating 100 times. 

This code *is* in transformer. Consider input XML:

<x:repeat times="100">
  <x:write>

    ... some xml goes here ...

  </x:write>
</x:repeat>


> The
> Serializer should only opperate on the XML given to it.

It is given with the XML, and operates only on it.


> A serializer should _*never*_ modify the content of the XML.  It
> can only modify the binary stream's representation of it.

It does not. I guess you did not understand my thought. Point is:
endDocument is no indication to component manager that this component is
free.


> > > As to timeouts, we can use one policy for the container type.  For
> > > example, Cocoon would benefit from a request based approach.
> >
> > What if processing continues after sending response?
> > I.e., after endDocument() on serializer, some work is done in
> > transformer? Like invoking other serializer?
> 
> Then you have broken Cocoon's design.  A Transformer does not invoke
> serializer. 

Transformers now invoke: Source, LDAP connections, SQL connections,
XML:DB collections, files, Loggers... What makes serializer so special?
Why code, say, XML->PDF code again and not reuse? Or
SAX->XML-in-a-String?


> Ever.  It is the Sitemap's responsibility to manage all
> pipelines--whether they have branched or not.  Once all processing for
> a request is done--and the sitemap or at least the Cocoon container
> knows this unequivicably--then it can reclaim the components.

    Exactly!

Cocoon *container* knows! But this is *not* indicated by some
endDocument() on some (intermediate) component in the middle of
processing!

But when and how you collect and return to the pool components used
during processing? Right now this is done as soon as component is not
needed. If you to do this only once and only after *whole* processing is
finished you are bound to hold (critical) resources longer then
necessary.


> > > Other
> > > containers may have to use a timeout based approach.  Its up to
the
> > > container.  Are timeouts sufficient?  No.  Does it add additional
> > > complexity for the container? Yes.  Does it help the developer?
> > > absolutely.
> >
> > There are situations when transaction takes hours to process
> > (I do not mean DB transaction here). How this will happen?
> 
> Wow.  Hours? Then you need to think of a different way of handling
that
> transaction. That is a deeper design issue that needs serious thought
> for that application.

Simple example: print invoices at the end of the month. You don't want
to hold lots of critical resources during, say, 8 hour process in
top-level component which performs this, right?


> > > > But component state is lost in the "refresh". Meaning
> > > > that for a SAX
> > > > transformer or *any other component with state* you have
> > > > screwed up
> > > > the processing. (So don't allow components with state,
> > > > then - well,
> > > > then they are all ThreadSafe and we do not need
> > > > pools.)
> > >
> > > See above.  The Cocoon pipeline component interfaces are really
> > > screwed up in this respect.  A component's state should be
> > > sufficient
> > > per thread.
> >
> > Thread can require several components of the same type to do
> > its work. How this will be handled?
> 
> Use the ***Manager approach above.  If you need a unique instance of
> a component for each lookup, then there is probably something wrong
> in your design.

J2EE has REQUIRES_NEW transaction management attribute for the EJB
method. If you have such methods (is it considered wrong design?), all
required for this method TxResource-s should be looked up, thus you will
have more then one instance of a component.


> > > Anything that is more granular than that needs a
> > > different treatment.
> >
> > What could it be?
> 
> A **Manager approach outlined above.
> 
> 
> > > > The basis of GC is that you can unambiguously tell when
> > > > an object is
> > > > no longer used - when it can not possibly be used. The
> > > > speedups we
> > > > have in pooling is due to explicitly telling the
> > > > container that this
> > > > object can be reclaimed, thus keeping the object count low.
> > >
> > > In Cocoon we have the advantage of knowing that.  A
> > > pipeline component
> > > cannot possibly be used past the processing of a request.
> >
> > Some transformers use instance of serializers to do its work.
> > It could be looked up on startup and returned on shutdown (to
> > speedup processing
> > - right now manager.release() is quite expensive operation),
> > and will not depend on request/response cycles.
> 
> :) And now you are getting why we need to design our components
> so that we do not need to release() them.

Lookup/release is slow now only because it tracks all the components
looked up by this component - all these scans of ArrayList's slowing
down things.


> BTW, The Fortress
> container has a much shorter release() cycle because it handles
> the logic asyncronously.  It may take a little longer getting the
> instance into the pool, but it doesn't affect the critical path.

This will have to be benchmarked then.


> However, if a Transformer directly uses a Serializer then something
> is wrong.  That was never the intention of the Cocoon component
> model.

Even if it is wrong about using Serializer from Transformer. How about
using Serializer from some other component, not Cocoon component?

Vadim



---------------------------------------------------------------------
To unsubscribe, e-mail: cocoon-dev-unsubscribe@xml.apache.org
For additional commands, email: cocoon-dev-help@xml.apache.org


Mime
View raw message