incubator-kato-spec mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart Monteith <stuk...@stoo.me.uk>
Subject Re: JSR 326 and Apache Kato - A "state of the nation" examination
Date Mon, 18 Jan 2010 11:10:20 GMT
My 2p too...


David Griffiths wrote:
> Hi Steve, this is my 2p (and definitely not IBM's 2p):
>
> I think you're heading off in the wrong direction. In one of your
> messages you said that developers didn't initially "get" the concept
> of using dumps for debugging their application problems. I'm with
> those developers. I think application developers already have a wealth
> of tools to assist with debugging and profiling their apps and they
> should continue to use those.
>
>    
Part of the value of Kato is that it should allow you to examine problem 
state without attaching a debugger
or profiling your application (server?) which it is running.
> The gap in the market that I think Kato should be addressing is
> analyzing post-mortem dumps in a production environment. First-failure
> data capture (FFDC) dumps where you don't always have the opportunity
> to set all the options to give you exactly the type of dump or trace
> you'd like. And I think you should be targeting the "image" part of
> the API as much as the Java part. Give us access to as much info as
> possible to debug a problem.
>
>    
I agree with this. I always envisaged FFDC as a major use for DTFJ and Kato.
I'm more sceptical about the Image part of the API, as any given general 
pure Java problem
isn't going to manifest itself in the Image API. Having said that, I do 
see it's use, but I don't believe
it is the first priority, and is the furtherest  from an Java 
application developer's view.
> This is the background of Kato. It is descended from DTFJ which is an
> internal IBM API for analyzing dumps. It is not limited to being used
> just by JVM service people to solve bugs in the JVM. Dan Julin can
> vouch for the fact that DTFJ is used widely by WebSphere to debug
> WebSphere application level issues. In fact Dan is one of the main
> customers for DTFJ.
>
>    
The DTFJ API is not an intenal API. It is documented in the Diagnostics 
Guide and is shipped
and supported in IBM's JVMs.
> I think you should forget about trying to define your own snapshot
> dump format and concentrate instead on providing access to core dumps
> which already contain all the information and more that we need. The
> support for analyzing core files is poor. It's crazy that so many
> people are still using gdb/dbx/etc rather than pure Java solutions.
>    
I think we should be working on core file readers as they can solve the 
majority of usecases.
But there are downsides. For one, the Sun HotSpot JVM is at best GPL 
licensed, which is incompatible
and puts us at risk, and secondly, without active involvement from Sun, 
is unlikely to be maintainable.
The DRLVM would be an interesting direction to go, but there are doubts 
about it's adoption and it's future.
> I think Kato should be mainly an API with maybe a reference
> implementation for some JVM on Linux. I don't understand why HotSpot
> is such an issue. It's up to either Sun or some third-party to provide
> a binary implementation of the Kato API for HotSpot core files. This
> should not be difficult to achieve.
>
>    
The issue it that we are developing a JSR as well as this an Apache 
Incubator project. For this we
need a specification, a TCK, and most importantly, a reference 
implementation. The reason for the
interest in the Hotspot JVM is that it is the JVM with the most market 
penetration.

> Easy problems already have easy solutions and plenty of them. It's the
> big complex production environments I think we should be targeting.
> The demand to analyze core dumps is there, what's missing is a
> JVM-neutral solution.
>
>    
This project's goals have evolved during it's lifetime, through either 
the changing constraints or through the
feedback that was received from various parties. As a result this isn't 
simply open sourced DTFJ. We do want
there to be interest in the project, so solving the problems in the "big 
complex production environments" may be less
of a priority compared to the everyday simple problems developers may 
find on their desktop. It seems we might have
to solve the latter before there'll be enough interest in the former.

Regards,
     Stuart


> Cheers,
>
> Dave
>
> On Wed, Jan 13, 2010 at 9:03 PM, Steve Poole<spoole167@googlemail.com>  wrote:
>    
>> Greetings all,
>>
>> Discussions this year have got off to a good start and  we're also really
>> close to providing  that first driver which contains the changes we've
>> discussed over time.    With that in mind I think its worth examining  the
>> past, present and future of this work.
>>
>> *A brief recap *
>>
>> We've been working on this JSR for sometime - since 5 August
>> 2008<http://jcp.org/en/jsr/detail?id=326>to be precise.
>>
>> At the start of the project we expected to be able to  develop , what I
>> called the "legs" , under the code contributed  by IBM.  These "legs" were
>> intended to map the API to the dumps that were available from a Sun JVM -
>> including being able to read Hotspot data from a core file.    We also
>> expected to drive quickly towards discussing the form of the future - how to
>> deal with titanic dumps and how not to have dumps at all.
>>
>> Most of this didn't happen.  We did write an HPROF reader but we didn't
>> manage to develop a core file reader for the Hotspot JVM.   In that regard
>> we also examined the Serviceability Agent
>> API<http://www.usenix.org/events/jvm01/full_papers/russell/russell_html/index.html>but
>> there were too many restrictions on use and operating environment.
>> It
>> turned out that it was not feasible for Apache Kato to develop a corefile
>> reader for Hotspot due to licensing issues and more importantly, lack of
>> skills in Hotspot.
>>
>> At that point we were somewhat stuck (I did discuss this problem privately
>> with various JVM vendors but we did not reach a resolution)
>>
>> All was not lost - we wrote a prototype (in python!)  of a new dump that
>> used JVMTI.  The dump was the first to contain local variables. We hooked it
>> up to the Java debugger through our JDI connector  to show that you could
>> use a familiar interface to analyse your problem.  Java
>> DBX<http://en.wikipedia.org/wiki/Dbx_%28debugger%29>for corefiles had
>> arrived.
>>
>> We also tacked on a JXPath<http://commons.apache.org/jxpath/>  based layer
>> (now in the KatoView tool) that allowed you to query the API without writing
>> reams of code.
>>
>> We took  JSR 326 to San Francisco and showed people what we had at JavaOne
>> BOF4870<http://cwiki.apache.org/confluence/display/KATO/BOF4870>     and I
>> got to meet a few of you face to face for the first time.
>>
>> Afterward JavaOne we rewrote the python prototype in C and started to bring
>> the first Early Draft
>> Review<http://cwiki.apache.org/KATO/jsr326specification.data/jsr326-edr-1-2009-08-21.pdf>
>> together,  although it took a long time to get the EDR on to the JCP site.
>> Mostly my learning of a new process and dealing with a  licensing concern
>> where I learned about the concept of "collective
>> copyright"<http://en.wikipedia.org/wiki/Copyright_collective>
>>
>> After the EDR was out we started work on the first code release from Apache
>> Kato (all new stuff to learn). We still hadn't resolved the mismatch between
>> what data the API said it could offer and our inability to provide said data
>> (ie no hotspot support).  The answer was to  factor out the relationship
>> between Java entities and native code entities and make it optional.  Now
>> those dumps that know nothing about processes or address spaces or even
>> pointers are not required to fake them.
>>
>> Finally , and quite recently,  we added in to the API the first attempt at a
>> standard dump trigger mechanism and we added an additional dump type  that
>> will help us as we develop the snapshot and optionality designs.
>>
>> *Today *
>>
>> Lets look to the present.  its January 2010 and there is a foot of snow
>> outside my window,  which is unusual for where I live.  What else is unusual
>> is that we have an Expert Group which has been so very quiet.   It's time to
>> examine our situation and discuss what else  it is that we need to do to
>> make this project a success.
>>
>>   At the highest level we need at least 4 things
>>
>>    1.  A design that will address our requirements.
>>    2.  A matching implemention that supports a high percentage of this
>>    design
>>    3.  Adoption by JVM vendors
>>    4.  A user community
>>
>>
>>
>> *Design*
>>
>> Do you know what our requirements are?  The original proposal for kato is
>> here<http://wiki.apache.org/incubator/KatoProposal>    and the JSR is
>> here<http://jcp.org/en/jsr/detail?id=326>
>>
>> Are these documents saying  what you expected and want?     The  Early Draft
>> Review<http://cwiki.apache.org/KATO/jsr326specification.data/jsr326-edr-1-2009-08-21.pdf>
>> outlines more.
>>
>>
>> *Implementation *
>>
>> We're going to provide a binary driver as soon as we possibly can for you
>> all to use - but you can check out the code and try building and using it
>> now.  We still have a  technical hurdle, We are hampered by  our inability
>> to make JVM modifications if necessary.  How should we resolve this?
>> Remember that we have to be able to provide a Reference Implementation to
>> match the specification.  We can legitimately justify having some edge
>> conditions that are not implemented but its no use to anyone if key parts of
>> the API are not implemented.  Having said that it is reasonable to consider
>> a middle ground where we specify a new JVM interface that we require to be
>> provided by JVM vendors.  It depends on technical circumstances but that
>> approach has more flexibility in implementation - its likely going  to be
>> easier to ask a JVM vendor to provide data to a new standardized  API.   My
>> current thinking is that for now we minimise this situation as much as
>> possible and live with slower implementations  at least until we've resolved
>> the outstanding questions of adoption by JVM Vendors.
>>
>> I think we've come to realize that the desire to be able to extract
>> information about a Hotspot JVM from a corefile is not going to happen and
>> is actually not necessary.  We've said right from the beginning that dumps
>> sizes are growing and  we need to consider have smaller dumps.  Rather than
>> finding a way to read Hotspot data from a corefile we just move directly to
>> defining  and implementing what a Snapshot Dump mechanism really is.   My
>> expectation is that we will only need JVM support for a yet to be designed
>> low level  API which we can use to extract information from a running JVM. I
>> really don't know what form that API would take - it might be something like
>> JVMTI , it might be a set of native methods - or it may just be new Java
>> classes that the JVM Vendor replaces.
>>
>> What drives this discussion and hence defines what we need from the JVM
>> vendors comes from  having the Snapshot concept clear in everyone's head.
>> Since this is new to everyone I want to provide an implementation that
>> embodies the concepts as a soon as possible so we can argue this through
>> from a practical hands-on approach.
>>
>>
>> *Adoption by JVM Vendors*
>>
>> Adoption by JVM vendors - and by that we mainly mean Sun and Oracle since
>> IBM already has a similar implementation -  is predicated on usefulness and
>> a need to have JVM specific code.  If there is no requirement for JVM
>> specific changes then adoption is  not really an issue.  If we have to have
>> JVM changes (and we will in the end)  then we need to either have Sun/Oracle
>> or another JVM vendor develop these JVM changes. Otherwise  we have to find
>> a 3rd party who is willing to develop a GPL licensed extension to OpenJDK to
>> support our requirements.
>>
>> We're going to have to wait a few weeks until the Oracle/Sun acquisition is
>> completed before we can expect to get a sensible answer to the first
>> question.   Its also possible  that we could go straight to the OpenJDK folk
>> and see if they wanted to play.   In either case though we would need to
>> have a good idea on the type of JVM changes and/or new  data access  we
>> need.
>>
>> *User Community *
>>
>> We need to agree who our users actually are.  I know that there are various
>> views but lets get it clear.  My view is that our users are the tools
>> vendors and more expert application programmers out there.  This API may
>> make life easier for the JVM vendor but only in passing.  The major
>> objective is to help programmers solve their own problems not to help JVM
>> vendors fix bugs in the JVM.   Do you agree?
>>
>> What else makes a user community?  Having something to use is high up the
>> list.  We need to get what we have out the door and being used.   Its not as
>> simple as that of course -we need documentation and usage examples and most
>> importantly we need to be able offer a compelling reason for using our
>> API.   Right now we're light on all of these.
>>
>>
>>
>> *What the future holds*
>>
>> I can't say how much I really appreciate all the time and effort that people
>> have expended so far on this project. It is a shame that we've not had the
>> larger buy-in where we expected it but that may change.  I intend to keep
>> asking.
>>
>> Right now though now I need to ask more of you : you as a subscriber to this
>> mailing list , you as a member of the Expert Group, you as a contributor or
>> committer to Apache Kato and you as a potential user of JSR 326.  I need you
>> to  tell me if we are on the right track,  are we going in the right
>> direction or not?  If we are doing what you expected say so as well: its
>> good to get confirmation.  If we're not addressing the issues you consider
>> need to be talked about - say so.  If you can help with documentation,
>> use-cases, evangelism,  coding, testing,  or anything - just say so.
>>
>>
>> In my view the future of this project  ranges from being *just* the place
>> where a new modern replacement for HPROF is developed all the way through to
>> delivering on those objectives we set ourselves in 2008.  I need your help
>> and active involvement right now  in determining our actual future.
>>
>> Thanks
>>
>> --
>> Steve
>>
>>      

-- 
Stuart Monteith
http://blog.stoo.me.uk/


Mime
View raw message