From kato-spec-return-269-apmail-incubator-kato-spec-archive=incubator.apache.org@incubator.apache.org Tue Nov 17 09:39:15 2009
Return-Path: <kato-spec-return-269-apmail-incubator-kato-spec-archive=incubator.apache.org@incubator.apache.org>
Delivered-To: apmail-incubator-kato-spec-archive@minotaur.apache.org
Received: (qmail 61979 invoked from network); 17 Nov 2009 09:39:15 -0000
Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.3)
  by minotaur.apache.org with SMTP; 17 Nov 2009 09:39:15 -0000
Received: (qmail 93834 invoked by uid 500); 17 Nov 2009 09:39:15 -0000
Delivered-To: apmail-incubator-kato-spec-archive@incubator.apache.org
Received: (qmail 93796 invoked by uid 500); 17 Nov 2009 09:39:15 -0000
Mailing-List: contact kato-spec-help@incubator.apache.org; run by ezmlm
Precedence: bulk
List-Help: <mailto:kato-spec-help@incubator.apache.org>
List-Unsubscribe: <mailto:kato-spec-unsubscribe@incubator.apache.org>
List-Post: <mailto:kato-spec@incubator.apache.org>
List-Id: <kato-spec.incubator.apache.org>
Reply-To: kato-spec@incubator.apache.org
Delivered-To: mailing list kato-spec@incubator.apache.org
Received: (qmail 93784 invoked by uid 99); 17 Nov 2009 09:39:15 -0000
Received: from athena.apache.org (HELO athena.apache.org) (140.211.11.136)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Nov 2009 09:39:15 +0000
X-ASF-Spam-Status: No, hits=-3.6 required=5.0
	tests=BAYES_00,RCVD_IN_DNSWL_LOW
X-Spam-Check-By: apache.org
Received-SPF: pass (athena.apache.org: domain of alois.reitbauer@dynatrace.com designates 212.33.55.23 as permitted sender)
Received: from [212.33.55.23] (HELO lilzmailso01.liwest.at) (212.33.55.23)
    by apache.org (qpsmtpd/0.29) with ESMTP; Tue, 17 Nov 2009 09:39:11 +0000
Received: from cpe90-146-74-58.liwest.at ([90.146.74.58] helo=dynatrace.com)
	by lilzmailso01.liwest.at with esmtpsa (TLSv1:RC4-MD5:128)
	(Exim 4.69)
	(envelope-from <alois.reitbauer@dynatrace.com>)
	id 1NAKWP-00078s-2S
	for kato-spec@incubator.apache.org; Tue, 17 Nov 2009 10:38:49 +0100
X-MimeOLE: Produced By Microsoft Exchange V6.5
Content-class: urn:content-classes:message
Subject: RE: "Snapshot" support
MIME-Version: 1.0
Content-Type: text/plain;
	charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Date: Tue, 17 Nov 2009 10:38:43 +0100
Message-ID: <6EB74B946FF4CF4BAA39332269A659D5024AE481@DYNASERV.dynatrace.local>
In-Reply-To: <9A3F08FF39974697B6AB18278575C219@dynatrace.local>
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
Thread-Topic: "Snapshot" support
thread-index: AcpnaKxe3Ky6yEmDTQORYDzGv8WHYgAALR6A
References: <3233DDB53A2443AD93BEEFF5B4E32E71@dynatrace.local> <96767F3ACB834414833882AF21EC3CF8@dynatrace.local> <6EB74B946FF4CF4BAA39332269A659D502285526@DYNASERV.dynatrace.local> <8EE068D5286A4D6F8957EDC063502553@dynatrace.local> <6ECCFB0097BE4B88BD180009D2210452@dynatrace.local> <6EB74B946FF4CF4BAA39332269A659D5023C40EE@DYNASERV.dynatrace.local> <80D4FF815B714FB1A2198CB806C99690@dynatrace.local> <6EB74B946FF4CF4BAA39332269A659D5024AE296@DYNASERV.dynatrace.local> <9A3F08FF39974697B6AB18278575C219@dynatrace.local>
From: "Alois Reitbauer" <alois.reitbauer@dynatrace.com>
To: <kato-spec@incubator.apache.org>
X-Spam-Score: -1.4 (-)

Steve,

good to see moving these requirements into the JSR. I also agree that
the interal structures are implementation specific. The vital point is
that tooling providers like us need a standardized way of accessing this
information form within the JVM. I think a lot the current problems are
caused because JVMTI/PI specify how a JVM vendor has to do certain
things instead of defining which data should be returned and how to work
with this data.

- Alois

-----Original Message-----
From: Steve Poole [mailto:spoole167@googlemail.com]=20
Sent: Dienstag, 17. November 2009 10:31
To: kato-spec@incubator.apache.org
Subject: Re: "Snapshot" support

Andreas , thanks for this info.

 I don't think much of what you've described is outside the JSR scope at
the
moment.  The JSR can deal with standardising the collection of data
(including statistics)  and when dumps are triggered.   The form of the
data
produced  is intentionally  implementation specific but the data reading
API
would be part of the standard.

I think its worth splitting out  the topics I've outlined  into separate
threads for further discussion so I'll create a threads for discussing
statistics and  dump triggers


On Mon, Nov 16, 2009 at 6:42 PM, Andreas Grabner <
andreas.grabner@dynatrace.com> wrote:

> Thanks Steve. This email seems to grow a lot - so I placed your
current
> questions at the top - see my combined answer below
>
> A)   What data is collected and how do you define the criteria
> B)   Whats types of analysis takes place and (apart from just
accessing
> the
> data sent)  what types of cross collection information is required
(I'm
> thinking about object corrolation )
> C)   How much data and how often is it required
>
> Our goal is to collect heap information from a remote location (our
> collectors). We have two options in terms of the depth of data. Simple
> Dump only includes the number of instances per class. The Extended
Dump
> includes object references. Data collection is either triggered
> manually, scheduled (e.g.: every 30 minutes during a load-test) or
> triggered (in case of a certain event - e.g.: heavy memory usage). The
> collected "raw" data is sent from the JVM to the central collector
that
> is analyzing the raw data in terms of, e.g.: what are the most used
> classes, walking the referrer tree, ...
>
> Our challenges/requests are
> * that this is very slow for very large heap sizes as we see it with
our
> clients.
> * we don't get this information in case of a severe runtime problem,
> e.g.: outofmemory
> * additionally to that we want to get more information about the
> individual objects on the heap, e.g: in which generation they live in.
>
> Some of this exceeds the scope of this JSR - please advice on what we
> should keep in here and what should be discussed elsewhere
>
> Thanks
>
>
>
> -----Original Message-----
> From: Steve Poole [mailto:spoole167@googlemail.com]
> Sent: Mittwoch, 04. November 2009 12:11
> To: kato-spec@incubator.apache.org
> Subject: Re: "Snapshot" support
>
> Sorry Andreas - been out on vacation.    Made a quick reply below and
I
> will
> do a more detailed response later.
>
>
> On Wed, Nov 4, 2009 at 11:46 AM, Andreas Grabner <
> andreas.grabner@dynatrace.com> wrote:
>
> > Hi Steve - following up on my email I've sent out two weeks ago. See
> my
> > answers below starting with "AG:"
> >
> > -----Original Message-----
> > From: Andreas Grabner
> > Sent: Mittwoch, 21. Oktober 2009 10:41
> > To: kato-spec@incubator.apache.org
> > Subject: RE: "Snapshot" support
> >
> > Hi Steve - find my answers below (lines starting with "AG:")
> >
> > Let me know how you want to split up this thread as we are
discussing
> > multiple topics here
> >
> > thanks
> >
> > -----Original Message-----
> > From: Steve Poole [mailto:spoole167@googlemail.com]
> > Sent: Samstag, 17. Oktober 2009 08:11
> > To: kato-spec@incubator.apache.org
> > Subject: Re: "Snapshot" support
> >
> > On Wed, Oct 14, 2009 at 2:55 PM, Andreas Grabner <
> > andreas.grabner@dynatrace.com> wrote:
> >
> > > Steve
> > >
> > > Thanks Andreas -  this is good stuff.  Questions below.
> >
> > I propose we continue to  discuss  on this thread but with the aim
of
> > pulling out the top level items as we go. For instance you've
> obviously
> > got
> > more requirements on JVMTI and we should pull that out as a separate
> > thread.    I'll do that once you've answered a few of my questions
> > below.
> >
> > Thanks  again
> >
> >
> > >
> > > I am following up on Alois' email with some uses cases that we
have
> > with
> > > our clients. Based on those use cases we also derived
requirements.
> > >
> > >
> > >
> > > Use Case: Really Large Memory Dumps don't Scale
> > >
> > > Most of our enterprise customers run their applications on 64Bit
> > Systems
> > > with JVM's having > 1.5GB Heap Space.
> > >
> > >
> > Do you have info  on what the latest heap size  is that you've
> > encountered?
> >
> > AG: We have seen heap sizes of 8GB
> >
> > Iterating through Java Objects on the Heap doesn't scale with
growing
> > > Heap Sizes. Due to the object tagging that creates a tag for every
> > > object on the heap we quickly exhaust the native memory.
> > >
> > > Do you have to tag all objects?
> >
> > AG: Our Memory Snapshot feature visualizes the referral tree of
> objects.
> > I believe that for this we need to create tags for each individual
> > object in order to get the reference information - unless there is
> > another way of walking the referrer tree that we are not aware of??
> >
> > For the RI and the JVMTI based dump we've sidestepped the problem by
> working from the threads and their object references
> That means we don't have to use the tagging to find objects but it
does
> mean
> you have to keep  track of ones you've seen
> before.
>
>
>
> > > Using the current JVMTI/PI APIs doesn't allow us to iterate over
the
> > > heap for large heaps in a timely manner or without running into
> memory
> > > issues -> Large Memory Dumps are often not possible!!
> > >
> >
> > > Use Case: Provide full context information in OutOfMemory
situation
> > >
> > > Capturing dump information in case of an OutOfMemory exception is
> key
> > to
> > > understand the root cause of this event.
> > >
> > > Access to the JVMTI interfaces to iterate the objects on the heap
is
> > not
> > > possible at this point in time which makes it impossible to
collect
> > heap
> > > information in the same way as when creating a dump during
"normal"
> > > runtime execution
> > >
> > > Therefore no detailed memory dumps can be made in the eventhandler
> of
> > an
> > > OutOfMemory exception!!
> > >
> >
> > I don't understand this -   the Eclipse MAT tool does detailed OOM
> > analysis
> > and that uses the HPROF file for Sun JVMs and other dumps for IBM
JVMs
> -
> > do
> > you  extra requirements beyond what MAT can offer?
> >
> > AG: In the next use case I explain why we prefer to not use dump
files
> > for analysis but rather have a "central approach" where our agent
> (that
> > sits in the JVM) can grab all the information needed to perform OOM
> > Analysis. In large distributed environments its not feasible to
start
> > collecting log files from different servers. With our agent
technology
> > we can collect this data from within the JVM and send it off to our
> > central server that manages all JVMs in the System
> >
> >
> > >
> > >
> > > Use Case: Central Management of Memory Dumps in Distributed
> Enterprise
> > > Applications
> > >
> > > Most of our enterprise customers run their distributed
applications
> on
> > > multiple servers hosting multiple JVM's. Creating and analyzing
> memory
> > > dumps in these distributed scenarios must be central manageable.
> > >
> > > Creating dump files and storing them on the server machines is not
a
> > > perfect solution because
> > >
> > > a)       Getting access to dump files on servers is often
restricted
> > by
> > > security policies
> > >
> > > b)       Local Disk Space required
> > >
> > > Therefore a dump file approach is not an option for most of our
> > > customers!!
> > >
> > >  Understood.
> >
> > >
> > >
> > >
> > > Requirements based on use cases
> > >
> > > *       Limit the native memory usage when iterating through
objects
> > >
> > >        *       Eliminate the need for an additional object tag
> > > requiring native memory -> this can exhaust native memory when
> having
> > > millions of objects
> > >        *       Instead of using the Tag use Object Ptr
> > >        *       Must ensure that the ObjectPtr stays constant (no
> > > objects are moved) throughout the iteration operation
> > >
> > >
> > *       Enable JVMTI Interface access to iterate through Heap
Objects
> in
> > > case of Resource Exhaustion (OutOfMemory)
> > >
> > >        *       Having full access to all Object Heap Interface
> > > functions allows us to capture this information in case of an OOM
> > >        *       Also have access to JVMTI interfaces for capturing
> > stack
> > > traces
> > >        *       Can some part of this information also be made
> > available
> > > in terms of a more severe JVM crash?
> > >
> > > *       Native Interface for memory dump generation
> > >
> > > Thats an interesting idea -  we were expecting to provide a Java
API
> > to do
> > that .  Having a native version could easily make sense.
> >
> > AG: Native would be our preference
> >
> >
> > >        *       In order to centrally manage memory dumps we need
to
> be
> > > able to do it via a native interface within the JVM
> > >
> >
> > I don't understand why you need to manage dumps via a a native
> > interface?
> >
> > AG: We have an agent that lives in the JVM. This agent sends memory
> > information to our central dynaTrace Server. This allows us to do
> > central management of all connected JVMs. I mentioned earlier that
> > working via dump files doesn't always work with our clients
(security
> > policies, disk space, ...). As there is JVMTI already - why not
extend
> > this API? We are also OK with a JavaAPI as long as it works within
the
> > JVM and not on Dump Files and as long as the performance is not a
> > problem (compared to a native implementation)
> >
> >
> hmm  - ok so we do need to be careful about not straying into the
world
> of
> tracing.  The JSR is intended to cover static data sets: even if they
> are
> taken very
> frequently :-).     The data doesn't have to reside on a dump file -
it
> could of course
> just be sent down the wire to a remote colection point.
>
> My concern is that at this point specifing additions to JVMTI  to
> improve
> its performance or design is too
> early.  I appreciate that you want to see JVMTI improved but we need
to
> move
> the discussion up a level and focus  on the  externals (and leave the
> implementors the choice of how they make it work)
> If the way to make a sensible solution ends up  requiring new or
> improved
> native level APIs then thats fine.
>
>
> My understanding is that you do the following -
>
> collect data
> send it to a collection point
> analyse it
>
> (repeat)
>
> Some of the questions we should be asking are
>
> A)   What data is collected and how do you define the criteria
> B)   Whats types of analysis takes place and (apart from just
accessing
> the
> data sent)  what types of cross collection information is required
(I'm
> thinking about object corrolation )
> C)   How much data and how often is it required
>
> Its these sort of questions that will help shape the API, and help us
> drive
> down to how actually we need to imagine it being implemented.
>
>
>
> > >        *       JVMTI would be a perfect candidate assuming the
> > existing
> > > limitations can be addressed
> > >        *       Separate native interface would be an alternative
> > option
> > >
> > > Agree -  but in either case addressing the usage issues with JVMTI
> > will
> > come down to understanding why JVMTI looks like it does now and how
> > other
> > approaches may affect the runtime performance of a system.
> >
> > AG: Agreed. Performance is a big topic for us. Getting this kind of
> > information must work fast - nobody wants to wait hours to grab a
> > detailed memory snapshot.
> >
> > >
> > >
> > > Additional requirements (maybe not in the scope of this JSR)
> > >
> > >
> > I think these are all worth discussing -   if you use the info then
we
> > should explore if it makes sense to specifiy it.l
> >
> >
> >
> > > *       Access to Objects in PermGen
> > > *       Generation information when iterating through objects
> > >
> > >        *       which generations are objects in that live in the
> heap
> > >
> > > *       Get access to Generation Sizes via JVMTI
> > >
> > >        *       Size information is available via JMX
> > >        *       so it should also be made available via the native
> > > interfaces
> > >
> > > *       Object Information on GC finished event
> > >
> > >        *       get information about how many objects have been
> > > moved/freed (either real object Id's or at least the size)
> > >        *       must be able to turn on/off this feature during
> runtime
> > > to keep overhead low when not needed
> > >
> > >
> > >
> > > Let me know if any of these use cases or requirements needs
further
> > > explanation.
> > >
> > >
> > >
> > > Thanks
> > >
> > > Andi & Alois
> > >
> > >
> > >
> > >
> > >
> > > -----Original Message-----
> > > From: Alois Reitbauer
> > > Sent: Montag, 21. September 2009 17:01
> > > To: kato-spec@incubator.apache.org
> > > Cc: Andreas Grabner
> > > Subject: RE: "Snapshot" support
> > >
> > >
> > >
> > > Steve,
> > >
> > >
> > >
> > > we will be happy to contribute our use cases. I propose to start
> with
> > >
> > > memory dumps first and thread dumps later. Either me or Andi will
> come
> > >
> > > back with some concrete use cases.
> > >
> > >
> > >
> > > - Alois
> > >
> > >
> > >
> > > -----Original Message-----
> > >
> > > From: Steve Poole [mailto:spoole167@googlemail.com]
> > >
> > > Sent: Dienstag, 08. September 2009 06:31
> > >
> > > To: kato-spec@incubator.apache.org
> > >
> > > Subject: "Snapshot" support
> > >
> > >
> > >
> > > One of the capabilities that this API is intended to provide is
> > support
> > >
> > > for
> > >
> > > "Snapshots"
> > >
> > >
> > >
> > > This is  based on the idea that for various reasons the dumps that
> we
> > >
> > > can
> > >
> > > get today can be too big, take too long to generate , not have the
> > right
> > >
> > > information etc.
> > >
> > >
> > >
> > > Also we need to recognise that dumps are not only produced to help
> > >
> > > diagnose
> > >
> > > a failure.  Some users consume dumps as part of monitoring a live
> > >
> > > system.
> > >
> > >
> > >
> > > So we need to discuss (at least)
> > >
> > >
> > >
> > > a)  How dump content configuration would work
> > >
> > > b)  What sorts of data are needed in a snapshot dump
> > >
> > >
> > >
> > > This is the largest outstanding piece of the API.   Now with Alois
> and
> > >
> > > Andreas on board we can start to clarify usecases and  resolve the
> > >
> > > design
> > >
> > >
> > >
> > >
> > >
> > > Cheers
> > >
> > >
> > >
> > > Steve
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> > Steve
> >
> >
> >
>
>
> --
> Steve
>


--=20
Steve