Mailing-List: contact avalon-dev-help@jakarta.apache.org; run by ezmlm
Precedence: bulk
Reply-To: "Avalon Developers List" <avalon-dev@jakarta.apache.org>
Reply-To: <bloritsch@apache.org>
From: "Berin Loritsch" <bloritsch@apache.org>
To: "'Vadim Gritsenko'" <vadim.gritsenko@verizon.net>,
   "'Avalon Developers List'" <avalon-dev@jakarta.apache.org>,
   <cocoon-dev@xml.apache.org>
Subject: RE: [Design] ContainerManager is under fire--let's find the best
 resolution
Date: Fri, 7 Jun 2002 11:12:05 -0400
Message-ID: <003b01c20e35$afcabc30$ac00a8c0@Gabriel>
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
In-Reply-To: <00be01c20e31$7fef22c0$0a00a8c0@vgritsenkopc>
Importance: Normal

> From: Vadim Gritsenko [mailto:vadim.gritsenko@verizon.net]=20
>=20
>=20
> And work itself will be done in XMLSource/XMLPipeline/XMLSink. I got
> that part. Hope performance will not be sacrificed by this move (you
> will be new()ing this objects all the time)

Modern JVMs have better GC policies, and is quicker at handling trivial
objects.  You can still do Pooling, but it is handled by the
GeneratorManager.


> > The new way would probably add a GeneratorManager for this purpose.
> > However,
> > the artifact returned is preinitialized with everything it=20
> needs.  The
> > GeneratorManager, TransformerManager, and SerializerManager can all
> take
> > care
> > of usage semantics if it handles pooled items.
>=20
> How they differ from ComponentSelector?


More focused management policies, type safety (no more casting), and
the "setUp()" method becomes the query method.  This allows more
specific criteria for Generator types.

Furthermore, a GeneratorManager can declare its own semantics.  If you
want the release() method there, then there are no issues conflicting
with overall CM design.


>=20
> I think discussion here was carried away from topic... Architecture of
> future Cocoon should be discussed separately. I was trying only to
> clarify how container will handle absence of the release() method.

The GeneratorManager would handle the release() method, or it would
declare its semantics for use.

Component use should not be a function of component lookup.


> > > > As the ContentHandler.endDocument() is called on each item,
> > > > they are automatically returned to their pools.
> > >
> > > Two issues on this one:
> > >
> > > 1. endDocument might be never be called. I can discard
> > > component after evaluating its cache ID or cache validity.
> > >
> > > 2. endDocument does not necessarily indicates that I'm done
> > > with this component.
>=20
> What about these points?

By implementing the GeneratorManager, et. al. the CM doesn't care
about it, and component GC is not necessary.  The XMLSource can be
released to the GeneratorManager.


> > > Simple example: you are using serializer
> > > to serialize xml fragment 100 times. It would be logical to
> > > make a loop:
> > >
> > > serialier =3D lookup();
> > > for(;;){
> > >   serializer.setDestination();
> > >   serializer.startDocument();
> > >   ...
> > >   serializer.endDocument();
> > > }
> > >
> >=20
> > Wrong application.  It is a Transformers job to modify the XML
> > so that you have an XML fragment repeating 100 times.=20
>=20
> This code *is* in transformer. Consider input XML:
>=20
> <x:repeat times=3D"100">
>   <x:write>
>=20
>     ... some xml goes here ...
>=20
>   </x:write>
> </x:repeat>

That is another design problem.  It is not the Transformer's job.
It is separate from the CM interface issue.


> > The
> > Serializer should only opperate on the XML given to it.
>=20
> It is given with the XML, and operates only on it.
>=20
>=20
> > A serializer should _*never*_ modify the content of the XML.  It
> > can only modify the binary stream's representation of it.
>=20
> It does not. I guess you did not understand my thought. Point is:
> endDocument is no indication to component manager that this=20
> component is
> free.

Forget GC for now.  Can you see how it can be done with a
GeneratorManager?


> > > > As to timeouts, we can use one policy for the container=20
> type.  For
> > > > example, Cocoon would benefit from a request based approach.
> > >
> > > What if processing continues after sending response?
> > > I.e., after endDocument() on serializer, some work is done in
> > > transformer? Like invoking other serializer?
> >=20
> > Then you have broken Cocoon's design.  A Transformer does not invoke
> > serializer.=20
>=20
> Transformers now invoke: Source, LDAP connections, SQL connections,
> XML:DB collections, files, Loggers... What makes serializer=20
> so special?
> Why code, say, XML->PDF code again and not reuse? Or
> SAX->XML-in-a-String?

What about the sitemap handling the separate sinks, you know the
pipeline multiplexer/demultiplexer concept?


> > Ever.  It is the Sitemap's responsibility to manage all
> > pipelines--whether they have branched or not.  Once all=20
> processing for
> > a request is done--and the sitemap or at least the Cocoon container
> > knows this unequivicably--then it can reclaim the components.
>=20
>     Exactly!
>=20
> Cocoon *container* knows! But this is *not* indicated by some
> endDocument() on some (intermediate) component in the middle of
> processing!

Which was my original point.  The endDocument() was an example of
another possibility.  IF you want to extend the SAX spec that says
a contenthandler  is done when endDocument() is called and it
can free resources, then that's on you.

>=20
> But when and how you collect and return to the pool components used
> during processing? Right now this is done as soon as component is not
> needed. If you to do this only once and only after *whole*=20
> processing is
> finished you are bound to hold (critical) resources longer then
> necessary.

That is a price of GC systems.  However, you can make critical resources
less prone to extended resource holding by providing something akin to
the DataSourceComponent, even if you make the release() method part of
the managing component.


> > > > Other
> > > > containers may have to use a timeout based approach.  Its up to
> the
> > > > container.  Are timeouts sufficient?  No.  Does it add=20
> additional
> > > > complexity for the container? Yes.  Does it help the developer?
> > > > absolutely.
> > >
> > > There are situations when transaction takes hours to process
> > > (I do not mean DB transaction here). How this will happen?
> >=20
> > Wow.  Hours? Then you need to think of a different way of handling
> that
> > transaction. That is a deeper design issue that needs=20
> serious thought
> > for that application.
>=20
> Simple example: print invoices at the end of the month. You don't want
> to hold lots of critical resources during, say, 8 hour process in
> top-level component which performs this, right?

Yes, but you wouldn't necessarily have your production (i.e. web) system
doing
this either.  It would be an offline process kicked off from the
commandline
(chron daemon) or something else along those lines.  It is an
asynchronous
process.  Smarter component design will allow you to avoid necessary
pooling,
causing fewer resources to be used, less resource contention, and
ultimately
higher performance.


> > > > > But component state is lost in the "refresh". Meaning
> > > > > that for a SAX
> > > > > transformer or *any other component with state* you have
> > > > > screwed up
> > > > > the processing. (So don't allow components with state,
> > > > > then - well,
> > > > > then they are all ThreadSafe and we do not need
> > > > > pools.)
> > > >
> > > > See above.  The Cocoon pipeline component interfaces are really
> > > > screwed up in this respect.  A component's state should be
> > > > sufficient
> > > > per thread.
> > >
> > > Thread can require several components of the same type to do
> > > its work. How this will be handled?
> >=20
> > Use the ***Manager approach above.  If you need a unique instance of
> > a component for each lookup, then there is probably something wrong
> > in your design.
>=20
> J2EE has REQUIRES_NEW transaction management attribute for the EJB
> method. If you have such methods (is it considered wrong design?), all
> required for this method TxResource-s should be looked up,=20
> thus you will
> have more then one instance of a component.

J2EE also allowed you to declare Servlets as single use (not one
instance
per thread or sharing an instance among threads)--does that make it
correct
design?  It was a serious bottleneck allowing a Q&D hack.

> > BTW, The Fortress
> > container has a much shorter release() cycle because it handles
> > the logic asyncronously.  It may take a little longer getting the
> > instance into the pool, but it doesn't affect the critical path.
>=20
> This will have to be benchmarked then.

There is a performance benchmark that uses ECM/Fortress in Fortress's
test code.  It has been compared.


> > However, if a Transformer directly uses a Serializer then something
> > is wrong.  That was never the intention of the Cocoon component
> > model.
>=20
> Even if it is wrong about using Serializer from Transformer. How about
> using Serializer from some other component, not Cocoon component?


Design how your system is supposed to interact--then enforce it.


--
To unsubscribe, e-mail:   <mailto:avalon-dev-unsubscribe@jakarta.apache.org>
For additional commands, e-mail: <mailto:avalon-dev-help@jakarta.apache.org>