cocoon-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Ralph Goers <Ralph.Go...@dslextreme.com>
Subject Re: Cocoon 2.1.7 hang
Date Thu, 19 Jan 2006 04:46:34 GMT
I thought I'd update you all on the problems we have been having with 
our production deployment.  We finally rolled out an update that include 
proper pool sizes for the components and the fix to the Castor mapping 
file.  However, before we did that our system engineer provided some 
valuable insight.  He provided a graph that shows one CPU going into a 
hard loop.  About 7 minutes later the system became completely congested 
and ran out of threads.  The pool and mapping file changes helped in 
that when the first failure after the changes occured it took about 30 
minutes for the system to run out of threads.

However, that information caused me to go back and look at my stack 
traces.  It turns out that everyone single one (with the exception noted 
below) showed one thread doing the same thing.  Now, we first deployed 
this product in March of 2005 and experienced no failures until a 
product update was released in August.  That update included some new 
calculators written in flowscript.  This version of Cocoon is using 
rhino1.5r4-continuations-20040629T1232.jar. The stack traces indicate 
that these are going into a loop and causing the system to die.  At 
first we thought that the calculators were not doing proper input 
validation and causing the wierd things to happen. The stack traces kind 
of supported this in that they look like:

"http-8080-Processor18" daemon prio=1 tid=0x30e38df8 nid=0x51a8 runnable 
[2d351000..2d35387c]
    at java.lang.Class.isPrimitive(Native Method)
    at 
org.mozilla.javascript.NativeJavaObject.getConversionWeight(NativeJavaObject.java:324)
    at 
org.mozilla.javascript.NativeJavaObject.canConvert(NativeJavaObject.java:259)
    at 
org.mozilla.javascript.NativeJavaMethod.findFunction(NativeJavaMethod.java:356)
    at 
org.mozilla.javascript.NativeJavaMethod.call(NativeJavaMethod.java:193)
    at org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1244)
    at 
org.mozilla.javascript.continuations.ContinuationInterpreter.interpret(ContinuationInterpreter.java:1134)
    at 
org.mozilla.javascript.continuations.ContinuationInterpreter.interpret(ContinuationInterpreter.java:190)
    at 
org.mozilla.javascript.continuations.ContinuationInterpreter.interpret(ContinuationInterpreter.java:138)
    at 
org.mozilla.javascript.continuations.InterpretedFunctionImpl.call(InterpretedFunctionImpl.java:121)
    at org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1244)
    at 
org.mozilla.javascript.ScriptableObject.callMethod(ScriptableObject.java:1591)
    at 
org.apache.cocoon.components.flow.javascript.fom.FOM_JavaScriptInterpreter.handleContinuation(FOM_JavaScriptInterpreter.java:812)
    - locked <0x66005778> (a 
org.apache.cocoon.components.flow.javascript.fom.FOM_JavaScriptInterpreter$ThreadScope)
    at 
org.apache.cocoon.components.treeprocessor.sitemap.CallFunctionNode.invoke(CallFunctionNode.java:123)

However, we have been able to recreate the loop without entering bad 
data.  In addition, we got a trace that was close to the start of the 
loop and it is somewhat different.  It seems to imply that there is 
something wrong with Continuation handling, but I have no idea. Two 
traces taken a few minutes later both looked like the one above.

    at org.mozilla.javascript.Interpreter.doubleWrap(Interpreter.java:2491)
    at 
org.mozilla.javascript.continuations.ContinuationInterpreter.interpret(ContinuationInterpreter.java:657)
    at 
org.mozilla.javascript.continuations.ContinuationInterpreter.interpret(ContinuationInterpreter.java:190)
    at 
org.mozilla.javascript.continuations.ContinuationInterpreter.interpret(ContinuationInterpreter.java:138)
    at 
org.mozilla.javascript.continuations.InterpretedFunctionImpl.call(InterpretedFunctionImpl.java:121)
    at org.mozilla.javascript.ScriptRuntime.call(ScriptRuntime.java:1244)
    at 
org.mozilla.javascript.ScriptableObject.callMethod(ScriptableObject.java:1591)
    at 
org.apache.cocoon.components.flow.javascript.fom.FOM_JavaScriptInterpreter.handleContinuation(FOM_JavaScriptInterpreter.java:812)
    - locked <0x66005880> (a 
org.apache.cocoon.components.flow.javascript.fom.FOM_JavaScriptInterpreter$ThreadScope)
    at 
org.apache.cocoon.components.treeprocessor.sitemap.CallFunctionNode.invoke(CallFunctionNode.java:123)

We have a way to get around this problem by replacing the flowscript 
calculators with CGIs for the time being. However, we will want to do 
something about this problem in the future.  One difficulty in debugging 
this problem though is that we have no idea which calculators are 
running or where they are at the time of the failure because interpreted 
javascript doesn't show up in the stack trace.  As a consequence I will 
probably recommend that they be rewritten as JSR-168 portlets instead of 
using flow - unless someone has a better idea.

Thoughts and comments are welcome.

Ralph





Mime
View raw message