cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Sylvain Wallez <>
Subject Re: [FYI] Profiling Cocoon...
Date Sun, 06 Oct 2002 21:04:24 GMT
Stefano Mazzocchi wrote:

> Hello people,
> I'm currently at Giacomo's place and we spent a rainy afternoon 
> profiling the latest Cocoon to see if there is something we could 
> fix/improve/blah-blah.
> WARNING: this is *by no means* a scientific report. But we have tried 
> to be as informative as possible for developers.
> We were running Tomcat 4.1.10 + Cocoon HEAD on Sun JDK 1.4.1-b21 on 
> linux, instrumented with Borland OptimizeIt 4.2.
> Here is what we discovered:
> 1) Regarding memory leaks, Cocoon seems absolutely clean (for cocoon, 
> we mean org.apache.cocoon.* classes). Avalon seems to be clean as 
> well. Good job everyone.
> 2) we noticed an incredible use of 
> org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far* 
> the most used class in the heap. More than Strings, byte[], char[] and 
> int[]. Some 140000 instances of that class.
> The number of bucketmap nodes grows linearly with the amount of 
> different pages accessed (as they are fed into the cache), but even a 
> cached resource creates some 44 new nodes, which are later garbage 
> collected.
> 44 is nothing compared to 140000, but still something to investigate.
> So, discovery #1:
>    BucketMaps are used *a lot*. Be aware of this.

IFAIK, bucketmaps are used as soon as a component is looked up, and 
getting a page from cache shouldn't reduce much the number of lookups 
since the pipeline has to be built to get the cache key and validity.

What could save some lookups is to have more ThreadSafe components, 
including pipeline components. For example, a generator could 
theroretically be threadsafe (it has mainly one generate() method), but 
the fact that setup() and generate() are separated currently prevents this.

Also we have to consider that component lookup is more costly than 
instanciating a small object. Knowing this, some transformers and 
serializers can be thought of as factories of some lightweight content 
handlers that do the actual job. These transformers and serializers 
could then also be made ThreadSafe and thus avoid per-request lookup.

This would require some new interfaces, which should coexist with the 
old ones to ensure backwards compatibility.

Thoughts ?

> 3) Catalina seems to be spending 10% of the pipeline time. Having 
> extensively profiled and carefully optimized a servlet engine (JServ) 
> I can tell you that this is *WAY* too much. Catalina doesn't seem like 
> the best choice to run a loaded servlet-based site (contact 
> if you want to do something about it: he's working on 
> Jerry, a super-light servlet engine based on native APR and targetted 
> expecially for Apache 2.0) has been done for several weeks now...

> 4) java IO takes something from 20% to 35% of the entire request time 
> (reading and writing from the socket). This could well be a problem 
> with the instrumented JVM since I don't think the JDK 1.4 is that slow 
> on IO (expecially using the new NIO facilities internally)
> 5) most of the time is spent on:
>   a) XSLT processing (and we knew that)
>   b) DTD parsing (and that was surprise for me!)
> Yeah, DTD parsing. No, not for validation, but for entity resolution. 
> It seems that even if the parser is non-validated, the DTD is fully 
> parsed anyway just to do entity evalutation.
> So, discovery #2:
>    Be careful about DTDs even if the parser is not validating.
> Of course, when the cache kicks in and the cached document is read 
> directly from the compiled SAX events, we have an incredible speed 
> improvement (also because entities are already resolved and hardwired).
> 6) Xalan incremental seems to be a little slower than regular Xalan, 
> but on multiprocessing machines this might not be the case [Xalan uses 
> two threads for incremental processing]
> NOTE: Xalan doesn't pool threads when it does that!
> So, while perceived performance is better for Xalan in incremental 
> mode,  the overall load of the machine is reduced if Xalan is used 
> normally.
> 7) XSLTC *IS* blazingly fast compared to Xalan and is much less 
> resource intensive.
> Discovery #3:
>  use XSLTC as much as possible!
> NOTE: our current root sitemap.xmap indicates that XSLTC is default 
> XSLT engine for Cocoon 2.1, but the fact is that the XSLTC factory is 
> commented out, resulting in running Xalan. We should either remove 
> that comment or uncomment the XSLTC factory.
> I vote for making XSLTC default even if this generates a few bug reports.


> 8) Cocoon's hotspot is.... drum roll.... URI matching.
> TreeProcessor is complex and adds lots of complexity to the call 
> stacks, but it seems to be very lightweight.

I'm happy to hear that :-) The TreeProcessor was designed to be as fast 
as possible, even if interpreted : pre-process everything that can be, 
and pre-lookup components when they're ThreadSafe. Call stacks can be 
impressive, but each frame performs very few computations.

> It's URI matching that is the thing that needs more work performance-wise.
> Don't get me wrong, my numbers indicate that URI matching takes for 3% 
> to 8% of response time. Compared to the rest is nothing, but since 
> this is the only thing we are in total control, this is where we 
> should concentrate profiling efforts.

Do you mean the WildcardURIMatcher ? Is this related to the matching 
algorithm, or to the number of patterns that are to be tested for a 
typical request handling ?

> Ok, that's it. Enough for a rainy swiss afternoon.
> Anyway, Cocoon is pretty optimized for what we could see. So let's be 
> happy about it.

Have you compared 2.0.x and 2.1 respective speeds on the same 
application ? This would be interesting to know if the 2.1 performs 
better than its ancestor.


Sylvain Wallez
 Anyware Technologies                  Apache Cocoon 

To unsubscribe, e-mail:
For additional commands, email:

View raw message