cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stefano Mazzocchi <>
Subject [FYI] Profiling Cocoon...
Date Sun, 06 Oct 2002 17:30:11 GMT
Hello people,

I'm currently at Giacomo's place and we spent a rainy afternoon 
profiling the latest Cocoon to see if there is something we could 

WARNING: this is *by no means* a scientific report. But we have tried to 
be as informative as possible for developers.

We were running Tomcat 4.1.10 + Cocoon HEAD on Sun JDK 1.4.1-b21 on 
linux, instrumented with Borland OptimizeIt 4.2.

Here is what we discovered:

1) Regarding memory leaks, Cocoon seems absolutely clean (for cocoon, we 
mean org.apache.cocoon.* classes). Avalon seems to be clean as well. 
Good job everyone.

2) we noticed an incredible use of 
org.apache.avalon.excalibur.collections.BucketMap$Node. It is *by far* 
the most used class in the heap. More than Strings, byte[], char[] and 
int[]. Some 140000 instances of that class.

The number of bucketmap nodes grows linearly with the amount of 
different pages accessed (as they are fed into the cache), but even a 
cached resource creates some 44 new nodes, which are later garbage 

44 is nothing compared to 140000, but still something to investigate.

So, discovery #1:

    BucketMaps are used *a lot*. Be aware of this.

3) Catalina seems to be spending 10% of the pipeline time. Having 
extensively profiled and carefully optimized a servlet engine (JServ) I 
can tell you that this is *WAY* too much. Catalina doesn't seem like the 
best choice to run a loaded servlet-based site (contact 
if you want to do something about it: he's working on Jerry, a 
super-light servlet engine based on native APR and targetted expecially 
for Apache 2.0)

4) java IO takes something from 20% to 35% of the entire request time 
(reading and writing from the socket). This could well be a problem with 
the instrumented JVM since I don't think the JDK 1.4 is that slow on IO 
(expecially using the new NIO facilities internally)

5) most of the time is spent on:

   a) XSLT processing (and we knew that)
   b) DTD parsing (and that was surprise for me!)

Yeah, DTD parsing. No, not for validation, but for entity resolution. It 
seems that even if the parser is non-validated, the DTD is fully parsed 
anyway just to do entity evalutation.

So, discovery #2:

    Be careful about DTDs even if the parser is not validating.

Of course, when the cache kicks in and the cached document is read 
directly from the compiled SAX events, we have an incredible speed 
improvement (also because entities are already resolved and hardwired).

6) Xalan incremental seems to be a little slower than regular Xalan, but 
on multiprocessing machines this might not be the case [Xalan uses two 
threads for incremental processing]

NOTE: Xalan doesn't pool threads when it does that!

So, while perceived performance is better for Xalan in incremental mode, 
  the overall load of the machine is reduced if Xalan is used normally.

7) XSLTC *IS* blazingly fast compared to Xalan and is much less resource 

Discovery #3:

  use XSLTC as much as possible!

NOTE: our current root sitemap.xmap indicates that XSLTC is default XSLT 
engine for Cocoon 2.1, but the fact is that the XSLTC factory is 
commented out, resulting in running Xalan. We should either remove that 
comment or uncomment the XSLTC factory.

I vote for making XSLTC default even if this generates a few bug reports.

8) Cocoon's hotspot is.... drum roll.... URI matching.

TreeProcessor is complex and adds lots of complexity to the call stacks, 
but it seems to be very lightweight. It's URI matching that is the thing 
that needs more work performance-wise.

Don't get me wrong, my numbers indicate that URI matching takes for 3% 
to 8% of response time. Compared to the rest is nothing, but since this 
is the only thing we are in total control, this is where we should 
concentrate profiling efforts.

Ok, that's it. Enough for a rainy swiss afternoon.

Anyway, Cocoon is pretty optimized for what we could see. So let's be 
happy about it.

Stefano Mazzocchi                               <>

To unsubscribe, e-mail:
For additional commands, email:

View raw message