incubator-kato-spec mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Andreas Grabner" <andreas.grab...@dynatrace.com>
Subject RE: "Snapshot" support
Date Wed, 21 Oct 2009 14:32:22 GMT
Hi Steve - find my answers below (lines starting with "AG:")

Let me know how you want to split up this thread as we are discussing
multiple topics here

thanks

-----Original Message-----
From: Steve Poole [mailto:spoole167@googlemail.com] 
Sent: Samstag, 17. Oktober 2009 08:11
To: kato-spec@incubator.apache.org
Subject: Re: "Snapshot" support

On Wed, Oct 14, 2009 at 2:55 PM, Andreas Grabner <
andreas.grabner@dynatrace.com> wrote:

> Steve
>
> Thanks Andreas -  this is good stuff.  Questions below.

I propose we continue to  discuss  on this thread but with the aim of
pulling out the top level items as we go. For instance you've obviously
got
more requirements on JVMTI and we should pull that out as a separate
thread.    I'll do that once you've answered a few of my questions
below.

Thanks  again


>
> I am following up on Alois' email with some uses cases that we have
with
> our clients. Based on those use cases we also derived requirements.
>
>
>
> Use Case: Really Large Memory Dumps don't Scale
>
> Most of our enterprise customers run their applications on 64Bit
Systems
> with JVM's having > 1.5GB Heap Space.
>
>
Do you have info  on what the latest heap size  is that you've
encountered?

AG: We have seen heap sizes of 8GB

Iterating through Java Objects on the Heap doesn't scale with growing
> Heap Sizes. Due to the object tagging that creates a tag for every
> object on the heap we quickly exhaust the native memory.
>
> Do you have to tag all objects?

AG: Our Memory Snapshot feature visualizes the referral tree of objects.
I believe that for this we need to create tags for each individual
object in order to get the reference information - unless there is
another way of walking the referrer tree that we are not aware of??


> Using the current JVMTI/PI APIs doesn't allow us to iterate over the
> heap for large heaps in a timely manner or without running into memory
> issues -> Large Memory Dumps are often not possible!!
>

> Use Case: Provide full context information in OutOfMemory situation
>
> Capturing dump information in case of an OutOfMemory exception is key
to
> understand the root cause of this event.
>
> Access to the JVMTI interfaces to iterate the objects on the heap is
not
> possible at this point in time which makes it impossible to collect
heap
> information in the same way as when creating a dump during "normal"
> runtime execution
>
> Therefore no detailed memory dumps can be made in the eventhandler of
an
> OutOfMemory exception!!
>

I don't understand this -   the Eclipse MAT tool does detailed OOM
analysis
and that uses the HPROF file for Sun JVMs and other dumps for IBM JVMs -
do
you  extra requirements beyond what MAT can offer?

AG: In the next use case I explain why we prefer to not use dump files
for analysis but rather have a "central approach" where our agent (that
sits in the JVM) can grab all the information needed to perform OOM
Analysis. In large distributed environments its not feasible to start
collecting log files from different servers. With our agent technology
we can collect this data from within the JVM and send it off to our
central server that manages all JVMs in the System

>
>
>
> Use Case: Central Management of Memory Dumps in Distributed Enterprise
> Applications
>
> Most of our enterprise customers run their distributed applications on
> multiple servers hosting multiple JVM's. Creating and analyzing memory
> dumps in these distributed scenarios must be central manageable.
>
> Creating dump files and storing them on the server machines is not a
> perfect solution because
>
> a)       Getting access to dump files on servers is often restricted
by
> security policies
>
> b)       Local Disk Space required
>
> Therefore a dump file approach is not an option for most of our
> customers!!
>
>  Understood.

>
>
>
> Requirements based on use cases
>
> *       Limit the native memory usage when iterating through objects
>
>        *       Eliminate the need for an additional object tag
> requiring native memory -> this can exhaust native memory when having
> millions of objects
>        *       Instead of using the Tag use Object Ptr
>        *       Must ensure that the ObjectPtr stays constant (no
> objects are moved) throughout the iteration operation
>
>
*       Enable JVMTI Interface access to iterate through Heap Objects in
> case of Resource Exhaustion (OutOfMemory)
>
>        *       Having full access to all Object Heap Interface
> functions allows us to capture this information in case of an OOM
>        *       Also have access to JVMTI interfaces for capturing
stack
> traces
>        *       Can some part of this information also be made
available
> in terms of a more severe JVM crash?
>
> *       Native Interface for memory dump generation
>
> Thats an interesting idea -  we were expecting to provide a Java API
to do
that .  Having a native version could easily make sense.

AG: Native would be our preference


>        *       In order to centrally manage memory dumps we need to be
> able to do it via a native interface within the JVM
>

I don't understand why you need to manage dumps via a a native
interface?

AG: We have an agent that lives in the JVM. This agent sends memory
information to our central dynaTrace Server. This allows us to do
central management of all connected JVMs. I mentioned earlier that
working via dump files doesn't always work with our clients (security
policies, disk space, ...). As there is JVMTI already - why not extend
this API? We are also OK with a JavaAPI as long as it works within the
JVM and not on Dump Files and as long as the performance is not a
problem (compared to a native implementation)


>        *       JVMTI would be a perfect candidate assuming the
existing
> limitations can be addressed
>        *       Separate native interface would be an alternative
option
>
> Agree -  but in either case addressing the usage issues with JVMTI
will
come down to understanding why JVMTI looks like it does now and how
other
approaches may affect the runtime performance of a system.

AG: Agreed. Performance is a big topic for us. Getting this kind of
information must work fast - nobody wants to wait hours to grab a
detailed memory snapshot.

>
>
> Additional requirements (maybe not in the scope of this JSR)
>
>
I think these are all worth discussing -   if you use the info then we
should explore if it makes sense to specifiy it.l



> *       Access to Objects in PermGen
> *       Generation information when iterating through objects
>
>        *       which generations are objects in that live in the heap
>
> *       Get access to Generation Sizes via JVMTI
>
>        *       Size information is available via JMX
>        *       so it should also be made available via the native
> interfaces
>
> *       Object Information on GC finished event
>
>        *       get information about how many objects have been
> moved/freed (either real object Id's or at least the size)
>        *       must be able to turn on/off this feature during runtime
> to keep overhead low when not needed
>
>
>
> Let me know if any of these use cases or requirements needs further
> explanation.
>
>
>
> Thanks
>
> Andi & Alois
>
>
>
>
>
> -----Original Message-----
> From: Alois Reitbauer
> Sent: Montag, 21. September 2009 17:01
> To: kato-spec@incubator.apache.org
> Cc: Andreas Grabner
> Subject: RE: "Snapshot" support
>
>
>
> Steve,
>
>
>
> we will be happy to contribute our use cases. I propose to start with
>
> memory dumps first and thread dumps later. Either me or Andi will come
>
> back with some concrete use cases.
>
>
>
> - Alois
>
>
>
> -----Original Message-----
>
> From: Steve Poole [mailto:spoole167@googlemail.com]
>
> Sent: Dienstag, 08. September 2009 06:31
>
> To: kato-spec@incubator.apache.org
>
> Subject: "Snapshot" support
>
>
>
> One of the capabilities that this API is intended to provide is
support
>
> for
>
> "Snapshots"
>
>
>
> This is  based on the idea that for various reasons the dumps that we
>
> can
>
> get today can be too big, take too long to generate , not have the
right
>
> information etc.
>
>
>
> Also we need to recognise that dumps are not only produced to help
>
> diagnose
>
> a failure.  Some users consume dumps as part of monitoring a live
>
> system.
>
>
>
> So we need to discuss (at least)
>
>
>
> a)  How dump content configuration would work
>
> b)  What sorts of data are needed in a snapshot dump
>
>
>
> This is the largest outstanding piece of the API.   Now with Alois and
>
> Andreas on board we can start to clarify usecases and  resolve the
>
> design
>
>
>
>
>
> Cheers
>
>
>
> Steve
>
>
>
>
>
>


-- 
Steve

Mime
View raw message