incubator-kato-spec mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Steve Poole <>
Subject Re: Snapshot Application development usecases
Date Mon, 04 Jan 2010 23:43:45 GMT
Hi David and welcome to the party - never too late...

I started writing this a quick note but it has turned into a sermon+rant.
It's not aimed at you but hopefully it's still informative.  I should write
up a better introduction and post it to the wiki.

For background you need to understand that we are not creating a new DTFJ.
The main driver now is towards helping Java developers solve their
application problems - not to help JVM developers solve JVM problems.   Its
a postmortem process for sure but we are considering how to address two
other requirements :   the need to deal with dumps that are getting larger
and are already unwieldy  plus the need to provide a subset of information
from a running system for tend analysis.

You also need to consider that we have to build a reference implementation
for the API  and since this is an Apache project we are doing that in an
open way using open data.  The proprietary information held within any IBM
dump is inaccessible to us unless IBM chooses to publish the data.   That's
sort of  true for the Sun JVM too - we can't write a dump reader that
understands the intricacies of a Sun JVM in Apache as the code available to
us is GPL.   It would be possible for someone to start a GPL licenced
project and produce the required code to match the JSR 326 API but that
would be GPL code too so again of limited use to us.

The two options available to us are

a)  a Licencee to the Sun JVM code writes an implementation and makes it
available under a suitable licence
b) We reduce our dependency on the data we need from a JVM to a level where
it is reasonable to declare it an optional part of the API.

I should also point out that none of the Sun JVM Licensees I've spoken to
see the value in providing the capability of reading Java info from a
corefile.   It may be when the Sun/Oracle acquisition is completed that we
can revisit this idea.

 For now we are left with having to deal with the dumps and APIs that we
have today.  If we need  more information from the JVM than we have
available today then we need to be considerate on  how and where this
information would be exposed to us.   There is nothing wrong with us
specifying  APIs to be added or extended in the  JVM but if they never get
implemented it's a pointless exercise and could jeopardise the project.

Where does that leave us?    As of today the JSR will finesse out the
relationship between the JavaRuntime API and the Image API so that its not a
requirement for implementions to have to do the whole of the API.    Since
the only existing public dump format available to us is HPROF and that
doesn't contain enough data for us the Apache Kato project is producing its
own dump(s). Thats inline with what we needed to do anyway for the RI (to
support the Snapshot requirements)  its just a shame that we've not had the
buy-in from JVM vendors we expected.

The snapshot API is a runtime API which will be used to trigger dumps in a
standard, programmatic, way and will allow an application programmer to
restrict  the data that will appear in the dump.  We also need to consider
what  extra information (such as statistical and/or JVM runtime ) is needed
but it not accessable to us today.  The data selection mechanism has to be
declaritive as we  shouldn't dictate how the implementation would work.

So this thread is about the mechanism that would be used in a running JVM by
an application programmer to trigger a dump with just the information that
they need to solve their particular problem.
I want to work out how such an declarative mechanism would work in

More comments below

On Mon, Jan 4, 2010 at 5:53 PM, David Griffiths

> Greetings. My name is Dave Griffiths and I've worked in IBM's Java
> Service group for many years. I have experience with Kato's
> predecessor DTFJ and various related technologies. I've been meaning
> to subscribe to this mailing list for some time so apologies if I'm a
> bit late to the party.
> Just some random thoughts on the concept of snapshot dumps.
> Problem analysis seems to fall into three main categories:
> 1) high cpu/deadlock where a simple snap of the thread stacks and
> monitors is all that's required. (Note: it's also vital to have the
> native stacks - for some reason this has often been problematic. From
> a debugging point of view getting this right dominates a lot of other
> stuff in terms of importance).
> Yep - agree with this idea and the level of importance.   How , as a
reference implementation do we get at the native stacks?   Note that Its
possible , if we can keep the size of the requirement small enough ,to
finesse  this from the spec point of view and just make it an optional
capability that the RI does not implement.

> 2) Java OOM conditions. For the vast majority of OOMs, a simple heap
> dump is all that is required. This should include the heap roots,
> class loaders and refs from one object to another but things like
> non-reference fields do not need to be included (see next category).
> (Disclaimer: I created the IBM PHD format which is clearly showing its
> age. But at the time it was thought to be important to have a fast and
> lightweight format).
Is there any reason why we might want to restrict the contents of dump that
is being used to solve OOMs?    If not then this is probably not something
to consider for the Snapshot discussion.

> 3) Other. For more complex problem determination that doesn't fit into
> the above categories you almost always want a complete system dump.
> Storage is cheap and size is not an issue. Often the person who
> collects the dump (whether test or at a customer site) is not the
> person who later analyzes it. There is nothing worse than being given
> a dump that doesn't contain everything you need. So I'm a great fan of
> system dumps even for things like OOM failures (eg you get people
> saying "well can we see the contents of that String object?") The main
> thing is a nice easy API to let you get at the stuff you need in the
> dump.
> As I said at the beginning  the system dump ( or at least any Java info in
it) is inacessable to us. In this particular topic we are actually exploring
how to define something that is by its very nature not a system dump :-)

So for your four examples, for example 1 I would say something like
> the current java thread dump would suffice (depending on what you mean
> by behaving incorrectly - otherwise a system dump).
> Example 2 - not sure what you mean by "about to throw". Isn't Kato
> meant to be for post-mortem dump analysis? We have a way (at least at
> IBM) to force a system dump when an exception occurs and that's one
> way of doing it, but most app developers would either use logging or a
> debugger in this scenario?
If you imagine - in IBM terms we are thinking about a sort of FFDC  for
application developers.

Example 3 - a heapdump should suffice.
> Example 4 - use a system dump since you want to see the non-reference
> fields of an object which are not stored in a heapdump.
> The format of the dump (thread, heap, system) is immaterial - the main
> thing is the consistent API to read them.
> So personally I would just stick to those three dump types and not try
> to over-egg the pudding.
> Cheers,
> Dave
> On Mon, Jan 4, 2010 at 3:19 PM, Steve Poole <>
> wrote:
> > I want to present a few application development usecases for us to
> discuss
> > to do with the Snapshot concept.
> >
> > First let's agree some basics
> >
> > An application snapshot is a deliberate subset of a running system.  It's
> > deliberate in that what is defined for inclusion / exclusion can be
> changed.
> >
> > To build an application snapshot three types of information are required
> >
> > 1 - root objects which determine where to start the snapshot from
> > 2 - rules to restrict how "far" to search
> > 3 - rules to describe what to include in the snapshot
> >
> >
> > Example 1
> >
> > A webserver session is behaving incorrectly.  The application programmer
> > wants to capture information
> > to do with the session itself and some specific application details.
> >
> > The root object is an HTTPSession and the application code is all
> contained
> > in packages that start org.acme.application
> > The application programmer believes that the problem is in the
> application
> > and so wants the minimum
> > of system information included.
> >
> >
> > Example 2
> >
> > An error has occured and the application is about throw an exception.
>  The
> > application programmer wants to capture a series
> > of objects related to the error and wants to see the Java stack including
> > local variables.
> >
> >
> > Example 3
> >
> > A enterprise server is running slow and the application programmer would
> > like to get a list of the instances of a specific interface
> > that the server is managing, since the suspicion is that there are
> multiple
> > instances of a nonthreadsafe instance of this interface when there should
> > only be one.
> >
> > Example 4
> >
> > A servlet server has run out of database connections and the suspicion is
> > that managed objects have not been
> > collected.  Typically programmers use finalisers or depend on GC instead
> of
> > using the Servlet.Destroy() method.
> > the application programmer needs to get a list of objects in "Destroyed"
> > state which obviously haven't been GC'd
> >
> >
> > Cheers
> >
> > --
> > Steve
> >


  • Unnamed multipart/alternative (inline, None, 0 bytes)
View raw message