incubator-kato-spec mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Carmine Cristallo <>
Subject Design guidelines
Date Tue, 17 Feb 2009 16:38:32 GMT
The main purpose of this email is to outline some of the design
considerations to be kept into account during the development of the
API and to stimulate the discussion about them.

Some of the following sections will be better understood after a quick
look at the IBM DTFJ API, which will constitutes the seed of the IBM
contribution to the Apache Kato project. Such sections will be clearly

1 General Principles

The following principles could be used as overall quality criteria for
the design of the API.

1.1 User Story driven: the design of the API will be driven by user
stories. As a general statement, no information should be exposed if
there is no user story justifying the need for it.

1.2 Comsumability: users of the API should be able to easily
understand how to use the API from the API description and from common
repeated design patterns. The amount of boilerplate code neccessary to
get at any useful information needs to be monitored. The user stories
supporting the API will aid in keeping the boilerplate down but its
important to state that the more understandable the API is, the easier
its adoption will be.

1.3 Consistency: common guidelines and patterns should be followed
designing the API. For example, all the calls returning multiple
values must have a common way of doing it (i.e. Lists or Iterators or

1.4 Common tasks should be easy to implement: care should be taken to
design the API in such a way that common user stories have a simple
implementation scenario. For example, in the DTFJ API, in most cases
there will be only one JavaRuntime per Image, and there should be a
more direct way of getting it than iterating through the AddressSpaces
and Processes.

1.5 Backward compatibility: the client code written against a given
release of the API should remain source code compatible with any
future release of the API.

2 Exception handling model

In the domain of postmortem analysis, the following types of
exceptions can occur:

2.1 File Access Exceptions: reported when an error occurs opening,
seeking, reading, writing or closing any of the file which constitute
the dump or are generated as a result of processing the dump.
Applications are expected to respond to this type of exception by
informing their users that they should correct the problem with their
file system (e.g. getting the file name right).

2.2 File Format Exceptions: reported when data is correctly read from
an encoded file, but that data is not compatible with the encoding
rules or syntax. Applications are supposed to respond to these
exceptions by informing their users that the file is corrupt and
further process is impossible.

2.3 Operation Not Supported Exceptions: the type of dump file being
analysed does not support the invoked API.

2.4 Memory Access Exceptions: thrown when the address and length of a
data read request does not lie entirely within one of the valid
address ranges.

2.5 Corrupt Data Exceptions: reported when data is correctly read from
a file but it has a value incompatible with its nature. Corruption is
to be considered as a normal event in processing postmortem dumps,
therefore such exceptions are not to be treated as error conditions.

2.6 Data Not Available Exceptions: reported when the requested
information is not contained within the specific dump being analysed.
As for the previous case, this is not to be seen as an error

Exception handling in DTFJ is a major source of struggle. Almost every
call to the DTFJ API throws one exception of the last two types, or
both. There's no question about the fact that such events are
definitely better handled with checked exceptions rather than with
unchecked ones. On the other hand, the fact that some objects
retrieved from a dump can be corrupted or not available is an
intrinsic condition to every API call. Handling such conditions with
checked exceptions would put the burden of handling them onto the
client code, leading to almost every API call being wrapped by a
try/catch block. As a side effect, it has been noted from past
experience that in such situations client code tends to take a form

public clientMethod1() {
      try {
      } catch (KatoException ke) {

rather than:

public clientMethod2() {
      try {
      } catch (KatoException ke1) {
      try {
      } catch (KatoException ke2) {
      try {
      } catch (KatoException ke3) {
and this can lead to poor debuggability of the client code.

It is also true that in very few cases the client code will need to
implement different behaviours for the Data Unavailable and the
Corrupt Data cases: most of the time, they will be treated in the same
way, and the corrupt data, when available, will just be ignored. It
would make sense, therefore, to group the two cases under a single
name: let's therefore define "Invalid Data" a situation where either
the data is not available or it's corrupted. So the key questions
become: "Does it make sense to think of a way to handle the Invalid
Data case without the use of exceptions? If yes, how?"

One possible solution to this problem could be to reserve the null
return value, in every API call, to Invalid Data: an API call returns
null if and only if the data being requested is either unavailable or
corrupted. To discriminate the two cases, the client code could call a
specific errno-like API which returns the corrupt data, of the latest
API call, or null if the data was unavailable. Most of the time, the
client code would therefore look similar to:

public clientMethod1() {
      KatoThing value;
      value = katoObject1.methodA();
      if (value == null) {
              // handle the invalid data

although, in a small number of cases, the code might be more similar to this:

public clientMethod2() {
      KatoThing value;
      value = katoObject1.methodA();
      if (value == null) {
              CorruptData cd = KatoHelper.getLastCorruptData();
              if (cd == null) {
                      // handle the data unavailable case
              } else {
                      // handle the corrupt data case

As a side effect, this solution would imply that primitive types
cannot be used as return values, and their corresponding object
wrappers would need to be used instead.

3.0 Optionality

The Kato API will be designed to support different types of dump
formats. Examples of system dumps are HPROF for SUN VMs, system dumps,
Javacores and PHD for IBM VMs, etc.

Different dump formats expose different information, so if we design
the API as a monolithic block, there will be cases in which some parts
of it – more or less large, depending on the dump format – may not be
Although the "Operation Not Supported Exception" case described above
does provide some support for these cases, we certainly need a better
mechanism to support optionality.
One possible solution lies in the consideration that we don't really
need to design for optionality at  method level: normally, dump
formats tend to focus on one or more "views" of the process that
generates them. Examples of these views are:

3.1 Process view: formats that support this view expose information
like command line, environment, native threads and locks, stack
frames, loaded libraries, memory layout, symbols, etc. System dumps
normally expose nearly all of these data.

3.2 Java Runtime view: formats supporting this view expose information
like VM args, Java Threads, Java Monitors, classloaders, heaps, heap
roots, compiled methods, etc. HPROF is an example of format that
supports this view.

3.3 Java Heap view: formats supporting this view expose Java classes,
objects and their relationships. IBM PHD is an example of dump format
supporting this view, as well as SUN's HPROF.

The API should be designed in order for a given file format to support
one or more of these views, as well as allowing new views to be
plugged in. Inside each view, it could be reasonable to provide a
further level of granularity involving Operation Not Supported

4 Data access models

In designing the data access models, care should be taken about the
fact that the API may have to deal with dumps whose size is vastly
greater than available memory. Therefore – and this holds especially
for Java objects in the heap view – creating all the objects in memory
at the moment the dump is opened may not be a good idea.
In this context, user stories will dictate the way data is accessed
from the dump. If it will turn out that heap browsing starting from
the roots, or dominator tree browsing will be major use cases, for
example, it makes sense to think about loading the children of a node
object lazily at the moment the parent object is first displayed, and
not any earlier. A first summary of the ways of accessing objects
could be the following:

4.1 retrieve the Java object located at a given address in memory (if
memory is available in the dump, i.e. if the dump supports a Process
4.2 retrieve all the heap root objects;
4.3 for any given object, retrieve the objects referenced by it;
4.4 retrieve all objects satisfying a given query (e.g. all objects of
class java.lang.String, or all objects of any class having a field
named "value"). This will involve having a query language of some form
built-in the API.

(more to figure out...)

Please feel free to share your comments about all the items above, and
to add more....


View raw message