incubator-kato-spec mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From David Griffiths <>
Subject Re: Snapshot Application development usecases
Date Mon, 04 Jan 2010 17:53:28 GMT
Greetings. My name is Dave Griffiths and I've worked in IBM's Java
Service group for many years. I have experience with Kato's
predecessor DTFJ and various related technologies. I've been meaning
to subscribe to this mailing list for some time so apologies if I'm a
bit late to the party.

Just some random thoughts on the concept of snapshot dumps.

Problem analysis seems to fall into three main categories:

1) high cpu/deadlock where a simple snap of the thread stacks and
monitors is all that's required. (Note: it's also vital to have the
native stacks - for some reason this has often been problematic. From
a debugging point of view getting this right dominates a lot of other
stuff in terms of importance).

2) Java OOM conditions. For the vast majority of OOMs, a simple heap
dump is all that is required. This should include the heap roots,
class loaders and refs from one object to another but things like
non-reference fields do not need to be included (see next category).
(Disclaimer: I created the IBM PHD format which is clearly showing its
age. But at the time it was thought to be important to have a fast and
lightweight format).

3) Other. For more complex problem determination that doesn't fit into
the above categories you almost always want a complete system dump.
Storage is cheap and size is not an issue. Often the person who
collects the dump (whether test or at a customer site) is not the
person who later analyzes it. There is nothing worse than being given
a dump that doesn't contain everything you need. So I'm a great fan of
system dumps even for things like OOM failures (eg you get people
saying "well can we see the contents of that String object?") The main
thing is a nice easy API to let you get at the stuff you need in the

So for your four examples, for example 1 I would say something like
the current java thread dump would suffice (depending on what you mean
by behaving incorrectly - otherwise a system dump).

Example 2 - not sure what you mean by "about to throw". Isn't Kato
meant to be for post-mortem dump analysis? We have a way (at least at
IBM) to force a system dump when an exception occurs and that's one
way of doing it, but most app developers would either use logging or a
debugger in this scenario?

Example 3 - a heapdump should suffice.

Example 4 - use a system dump since you want to see the non-reference
fields of an object which are not stored in a heapdump.

The format of the dump (thread, heap, system) is immaterial - the main
thing is the consistent API to read them.

So personally I would just stick to those three dump types and not try
to over-egg the pudding.



On Mon, Jan 4, 2010 at 3:19 PM, Steve Poole <> wrote:
> I want to present a few application development usecases for us to discuss
> to do with the Snapshot concept.
> First let's agree some basics
> An application snapshot is a deliberate subset of a running system.  It's
> deliberate in that what is defined for inclusion / exclusion can be changed.
> To build an application snapshot three types of information are required
> 1 - root objects which determine where to start the snapshot from
> 2 - rules to restrict how "far" to search
> 3 - rules to describe what to include in the snapshot
> Example 1
> A webserver session is behaving incorrectly.  The application programmer
> wants to capture information
> to do with the session itself and some specific application details.
> The root object is an HTTPSession and the application code is all contained
> in packages that start org.acme.application
> The application programmer believes that the problem is in the application
> and so wants the minimum
> of system information included.
> Example 2
> An error has occured and the application is about throw an exception.  The
> application programmer wants to capture a series
> of objects related to the error and wants to see the Java stack including
> local variables.
> Example 3
> A enterprise server is running slow and the application programmer would
> like to get a list of the instances of a specific interface
> that the server is managing, since the suspicion is that there are multiple
> instances of a nonthreadsafe instance of this interface when there should
> only be one.
> Example 4
> A servlet server has run out of database connections and the suspicion is
> that managed objects have not been
> collected.  Typically programmers use finalisers or depend on GC instead of
> using the Servlet.Destroy() method.
> the application programmer needs to get a list of objects in "Destroyed"
> state which obviously haven't been GC'd
> Cheers
> --
> Steve

View raw message