incubator-kato-spec mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Bobrovsky, Konstantin S" <konstantin.s.bobrov...@intel.com>
Subject RE: JSR 326: Post mortem JVM Diagnostics API - Developing the Specification
Date Mon, 29 Dec 2008 13:27:39 GMT
Hi Steve, all

The document you sent looks like a great starting point to me, the list of possible tools
is reasonable as well. Here is couple more suggestions in this area:

1) "Retrospector"
One of the rationales behind the JSR-326 was complexity of failure & performance analysis
increasing with CPU core count growth. I believe it would be good to have a tool showcasing
this rationale among the first cohort of the sample tools.

Here is a description of one such tool - let's call it "retrospector" for now. I admit it
might already exist somewhere in the Java universe, but anyway it seems to fit well into JSR-326.
        The idea is to define a number of landmarks or checkpoints in the code and see how
their mutual temporal ordering and latency change (a) over time (b) depending on the number
of threads (c) depending on the number of available CPU cores (d) ... Checkpoints can be of
different kinds:
- well-known events defined by JVMTI (monitor enter/exits, method entry/exits, class loads,
code generation, thread lifecycle events,...)
- new kinds of events helping solve particular problems. For example, interpreted -> compiled
code execution mode change for a method, reaching a safepoint (in Hotspot terms), native memory
allocation in from JNI code, arbitrary checkpoint specified by a user as <method, bytecode
position> pair (with optional additional refinements or filtering), etc.
        When a checkpoint is reached by a thread, the JRE internally quickly logs this fact
by adding a record <checkpoint ID, thread ID, timestamp, pc, ...> into a buffer, which,
at appropriate time, is processed and made available via the JSR-326 API. [Note: The logging
should be extremely efficient to minimize the "observer effect", which seems quite possible
with thread-local checkpoint event buffer.].
        The retrospector tool will be able to load the 2 or more checkpoint logs and
- split up each of them into per-thread events
- visualize the checkpoint events mapped onto the timescale
- visualize the difference in temporal behavior between two threads within the same run of
the analyzed app, between two threads within different runs on the analyzed app
- ...
        This will greatly assist in temporal behavior analysis of an application, its scalability.

Another tool which can be easily implemented based the checkpoint traces is a deadlock analyzer
(most likely, already existing, but just to show applicability of this kind of data) provided
that monitor state change checkpoints have been recorded. The tool can analyze whether there
are sequences of different monitor acquisitions by any two threads in a different order.

It is an important question how checkpoints can be implemented - can this be done w/o modification
of the JVM or not. For some kinds of checkpoints it can be done via bytecode instrumentation,
for others - like internal VM lock enter - can not.

2) "Debugger with memory".
This is a small addition to the runtime, which, for each thread, logs information about last
N branch targets leading to current PC (regardless of whether branch targets an the PC belong
to the generated or JVM code). JSR-326-enabled debugger could expose this data to the user
to ease debugging. In my own experience, the need in such information arises pretty regularly
when investigating crashes.


Also, JFYI, there is one more interesting tool, which I don't recall being mentioned - JRockit's
"flight recorder" technology:
http://edocs.bea.com/jrockit/geninfo/diagnos/intromiscon.html
Maybe it can help suggest some more tools ideas JSR-326 can cover.

Thanks,
Konst

Closed Joint Stock Company Intel A/O
Registered legal address: Krylatsky Hills Business Park,
17 Krylatskaya Str., Bldg 4, Moscow 121614,
Russian Federation


-----Original Message-----
From: Steve Poole [mailto:spoole167@googlemail.com]
Sent: Friday, December 12, 2008 9:43 PM
To: kato-spec@incubator.apache.org
Subject: JSR 326: Post mortem JVM Diagnostics API - Developing the Specification

Greetings

I had intended to post this document as a wiki page but we don't have a wiki
yet!

The following document is a work-in-progress.  This document is ultimately
intended to capture the approach , process and scope for the development of
this specification.    This is very much an initial brain dump so please
feel free to point out any inaccuracies omissions, trade mark violations
etc!

I particular would like to receive feedback on the list of proposed sample
tools.  The  tools are proposed as examples that could be developed to
demonstrate the validity of the specification.  Its likely that not all of
these tools are necessary or doable in the timescales of the first release.

EG members - please let me know one way or another if you consider the list
to be acceptable  and descriptive enough for us to start expanding into more
detailed user stories.  Note we may not end up actually producing  all of
these tools - but that should not stop us as Specification designers from
defining the necessary user stories.

Thanks

Stebe


---------------------------------------

JSR 326: Post mortem JVM Diagnostics API -  Developing the Specification

Version Info

Initial : Steve Poole  12 Dec 2008

1.0: JSR Objectives

Define a standard Java API  to support the generation and consumption of
post mortem or snapshot Java diagnostic artefacts. The specification will be
inclusive of a range of existing "in field" diagnostic artefacts: including
common operating system dump formats. The specification will balance the
need to provide maximum value from existing artefacts, while considering the
problem space expected to be encountered in the near and longer term future,
with multiple language environments and very large heaps


2.0: Approach

The design of the API will be driven directly by user stories.    To ensure
coherence between user stories the stories will themselves be developed as
requirements on several sample tools.  The project does not seek to create
state of the art tools but recognises that having useful and useable sample
tools is crucial in  demonstrating the validity of the API and will
encourage others to build  alternative "better mouse-traps". These examples
tools will also help define the  non-functional characteristics that are not
easily translated into user stories -  characteristics such as scalability,
performance, tracing etc.

These tools and the  embodiment of the JSR specification: i.e. the
reference implementation (RI) and test compliance kit (TCK),  are being
developed as an Apache Software Foundation Incubator. The JSR Expert Group
(EG) and the Reference Implementation developers will work together to
define, develop and refine the specification etc.  The specification is
intended to be incrementally developed and will always be available via the
RI API Javadoc.   As the JSR moves though its various stages the
specification at that point will be declared by referring to a publically
visible form of the Javadoc and the associated repository revision.


3.0: Initial starting point.

IBM is contributing  non proprietary portions of its  Diagnostic Tool
Framework for Java (DTFJ) and associated  tools, samples , documentation and
testcases.  This contribution is  only a seed. The JSR EG must review and
amend this API as necessary to meet its requirements.   EG members can also
contribute directly to the specification by providing testcases or  code
samples etc.


4.0: API Structure

Analysis of the types of dump  suitable for including in the scope of this
JSR shows that there are three basic categories.  These categories are  1)
dumps that contain process information, 2) Dumps that contain information
about a Java runtime and 3) dumps that are limited to the contents of a
Java  Heap.  Generally these dumps are inclusive in the sense that, for
instance, a process dump normally contains a Java Runtime and it in turn
contains information about the contents of the Java heap.   The inverse of
this is not true.  This categorisation is used in this document and will be
used to  help structure the development of the JSR. The categorisation
should  not be assumed to be set in stone.


5.0: Sample Tools - the primary drivers for developing  JSR User Stories



5.1: Process Explorer

An Eclipse plugin which allows presentation, navigation and simple querying
of the elements of a Process Dump.   This tool will demonstrate how to
explore a dump in an efficient and high performing manner.   Key
characteristics will include  fast startup time, handling of large
quantities of data (including summarization), effective navigation to areas
of interest.

5.2: Native Memory Analyser Tool

A program that can retrieve native memory allocations by (or on behalf of) a
Java Runtime and provide trace back to the  Java objects that hold the
allocation reference.   The tool will be able to display what memory
allocations exist, the contents of the allocation, and conservatively
identify which entities hold references to that allocation.  Ideally this
tool will be able to point to specific fields within Java objects that hold
the references.   This tool will demonstrate the capabilities   of the API
to find and display native memory  from a memory manager.    Key
characteristics will include the performance of the API in  exhaustively
scanning a dump (for memory allocation handles) and the ability to resolve
an arbitrary location within the dump into a Java object or similar entity

5.3: Java Runtime Explorer

Similar to the Process Explorer above this Eclipse plugin will allow the
presentation, navigation and  simple querying of the elements of a Java
Runtime dump. This tool will demonstrate how to explore a Java runtime dump
in an efficient and high performing manner.   Ideally the plugin will also
demonstrate the APIs ability to  provide some support for  virtualisation of
Java runtime objects so that implementation specifics concerning objects
within  the java.lang and java.util packages can be hidden.   Key
characteristics will include  fast startup time, handling of large
quantities of data (including summarization), effective navigation to areas
of interest, useful abstraction of key Java object implementation specifics

5.4:  Runtime Investigator

A program that can examine a dump and provide guidance on common aliments.
This tool will provide analysis modules that can report on such items as
deadlock analysis,  heap occupancy etc The tool will provide extension
points that will allow others to contribute new analysis modules.  Key
characteristics of this tool will include handling large quantities of data
efficiently  (probably via a query language of some type) , ensuring the API
is generally consumable by programmers and ensuring the API provides the
data that is actually required to analyze real problems.

5.5: Java Runtime Trend Analyzer

A tool that can compare multiple dumps and provide trend analysis.  This
tool will provide analysis modules that can report on such items as  heap
growth etc The tool will provide extension points that will allow others to
contribute new analysis modules.  Key characteristics of this tool will
include exercising  the creation of well formed dumps,  fast startup time,
correlation between dump objects and handling large quantities of data
efficiently  (probably via a query language of some type) , ensuring the API
is generally consumable by programmers and ensuring the API provides the
data that is actually required to analyze real problems.


5.6: Java Debug Interface (JDI) Connector

An adapter that allows a Java debugger to interrogate the contents of a Java
Runtime diagnostic artifact.  This connector will enable similar
capabilities that exist today with other debuggers than can debug corefiles
or similar process diagnostic artifacts.   This tool will demonstrate  key
characteristics such as effective navigation to areas of interest, useful
abstraction of key Java object implementation specifics and that the API
provides  the data that required to analyze real problems.


5.7: Memory Analyser Tool (MAT) Adapter

MAT (http://www.eclipse.org/mat/) is an open source project that consumes
HPROF and DTFJ supported dumps.  MAT is designed to help find memory leaks
and reduce memory consumption.  An adapter for MAT will be developed that
allows MAT to consume HPROF and other dumps via the JSR 326 API.   Key
characteristics of this adapter will include  handling large quantities of
data efficiently,  useful abstraction of key Java object implementation
specifics and  dump type identification.



6.0: Reference Implementation Scope

The Reference Implementation will not create implementations for all  JVMs
or diagnostic artifacts.   The scope of the project  is to only encompass
the open , public and most used combinations.     The initial proposal for
the API defines three separate categories of diagnostic artifact.  The
Reference Implementation will be developed to consume the following
diagnostic artifacts from those categories

6.1: Process Level Diagnostic Artifacts

Operating System  / Diagnostic Artifact
Linux Ubuntu 8.10 x86   : ELF format Process Dump (1)
Microsoft Windows XP   : Microsoft userdump (2)
IBM AIX 6.1                  : AIX corefile (3)

(1) ELF Format is a publically available format described in many places -
usually starting with elf.h!

(2) Microsoft userdumps are in minidump format. Description starts here
http://msdn.microsoft.com/en-us/library/ms680378(VS.85).aspx

(3) IBM AIX corefile format is publically available
http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/com.ibm.aix.files/doc/aixfiles/core.htm


6.2: Java Runtime Diagnostic Artifacts


JVM / Diagnostic Artifact
Sun Linux/Windows OpenJDK 6.0 JRE    : HPROF Binary format
Sun Windows OpenJDK 6.0 JRE             : Microsoft userdump
Sun Linux x86 OpenJDK 6.0 JRE            : ELF format Process Dump
Sun Linux/Windows OpenJDK 6.0 JRE    : Serviceability Agent API (1)
Sun  Linux/Windows Java 1.4.2_19 JRE  : HPROF Binary format (2)



(1) Assumes classpath exception for API can be granted by Sun Microsystems
(2) Sun java 1.4.2_19 JRE support will be on a best can do basis since
information about the internal structures of the JRE is not publically
available.  In the event that critical information is required then we will
ask Sun Microsystems for help and request they publish the information.


6.3: Java Heap Diagnostic Artifacts

The Java Heap category of API is effectively a subset of the JavaRuntime API
and thus the list below is the same as above.

JVM / Diagnostic Artifact
Sun Linux/Windows OpenJDK 6.0 JRE    : HPROF Binary format
Sun Windows OpenJDK 6.0 JRE             : Microsoft userdump
Sun Linux x86 OpenJDK 6.0 JRE            : ELF format Process Dump
Sun Linux/Windows OpenJDK 6.0 JRE    : Serviceability Agent API
Sun  Linux/Windows Java 1.4.2_19 JRE  : HPROF Binary format


6.4: Other dump formats and implementations

During this project IBM  will  be producing prototype implementations of the
API  for some subset of  IBM JREs and dump formats.  This will provide
welcome feedback on the API and allow early adopters a broader set of
environments to work with.


7.0  Closing the seed contribution short fall.

8.0  Timescales,  schedule, milestones

Mime
View raw message