geronimo-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Matt Hogstrom <>
Subject Re: [DISCUSS] Improving diagnostics
Date Thu, 14 Jun 2007 15:16:11 GMT
I saw the work you did in OEJB.  This would be a HUGE help as well.

On Jun 14, 2007, at 9:44 AM, Rick McGuire wrote:

> One thing that might be a useful first start would be something  
> similar to the "Hungry Exception" cleanup effort that was done with  
> openejb3.  Basically, this just ensured that any place where an  
> exception gets thrown because of another caught exception that the  
> original exception was maintained as a cause.  I've spend a lot of  
> time grumbling about situations where the original exception  
> information was thrown away, sometimes through multiple levels of  
> failure.
> Rick
> Matt Hogstrom wrote:
>> Lately I've been working with users in debugging various  
>> application problems.  Some of the problems are merely  
>> configuration but others are deeper application / infrastructure  
>> problems.  Regardless of the type of problem I've never personally  
>> been satisfied with the diagnostic information produced by the  
>> server (this isn't a Geronimo statement but really AppServers as a  
>> whole including WebSphere and WebLogic).
>> Here are some thoughts that I want to pursue:
>> I’ve been working with some customers lately and the work has  
>> centered around debugging some of the aspects of their server.  In  
>> this case it was using Apache Geronimo but the problem really  
>> applies to most application servers in general.  For the most part  
>> there is little diagnostic information available when an  
>> application fails.  We get the ever popular nested Java Stack  
>> trace which is certainly a good indicator of where a failure  
>> occurred but is woefully inadequate in many instances of why a  
>> failure occurred.  This get’s worse in that for the most part  
>> people need to recreate the problem with additional tracing and,  
>> in the worst case, additional diagnostic code in their  
>> application.  Wouldn’t it be nice if some of the diagnostic  
>> capability that was needed was included in the server itself?
>> Over the next few weeks I’m going to be doing some experimentation  
>> on how to improve server diagnostics through the use of Aspects  
>> and/or Instrumentation.  Since this is experimental we’ll see what  
>> the final result will be.  Here are some of my initial goals:
>>    1.  Improve diagnostics by providing a Diagnostic Report when a  
>> Tx fails.
>>    2. Provide better visualization of Java Stack traces so problem  
>> areas pop out.
>>    3. Capture wait information
>> For number 1 I’m going to focus on servlets to begin with given  
>> that they represent the preponderance of requests made in  
>> AppServers today.  This information will include information from  
>> the request object, the servlet being invoked, invocation time,  
>> transaction ID (if it exists), enlisted connections (database and  
>> messaging), oh yeah, and the Thread ID of execution.  This is a  
>> mouthful to begin with anyway.
>> Number 2 is really just applying some template information on a  
>> Java Stack trace.  I want application classes to standout so  
>> developers will be able to quickly see where their application is  
>> involved.  Infrastructure pieces like the server, Hibernate,  
>> TopLink, etc. would also be highlighted in a different color and  
>> style and plain old java classes would be a boring style as their  
>> are merely pawns in the transactional game.
>> Finally, wouldn’t it be nice to know how long a thread has been  
>> waiting and for what reason?  Is it waiting on a request from  
>> another server or perhaps there is a database locking problem.   
>> Did a WebService go awry?  Basically, I want to know what in the  
>> heck a thread is waiting on.
>> Please chime in with your thoughts.  I spect that Aspects or  
>> Instrumentation may be the only way to go for some of this as many  
>> of the components we include won't have this capability in them.

View raw message