incubator-kato-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Stuart Monteith <>
Subject Re: Processing huge heap dump.
Date Sun, 10 Jan 2010 19:58:52 GMT
The hprof dump reader spends a lot of time reading the whole file, for 
various reason.
The indices it has in memory are constructed through an initial read, 
and this is also
the source of the memory usage. In addition, there is some correlation 
to be done which
also takes up time, and induces yet more reading.

I'm sure some work could be done to improve the performance further, but 
we'll have to
look at the tradeoff between diskspace and memory usage. The hprof file 
format itself
is what it is, however, and we have no influence over that. The CJVMTI 
agent is has lots of
room for improvement, but I suspect its potential for improvement is 
unlikely to be much better
than existing hprof implementations. The built-in JVM hprof dumper will 
probably be a hard act
to follow.

The HProf implementation is not thread-safe. Realistically, I think it 
is something that ought to
be considered once things are more mature. There will be algorithms that 
can deal with the JVM
structure sensible.

And thanks Lukasz, it's great to have your input.


Steve Poole wrote:
> Hi Lukaz - thanks for posting.
> On Fri, Jan 8, 2010 at 7:11 PM, Lukasz<>  wrote:
>> Hello,
>> In my work I have faced problem where I have to process 60GB heap dump.
>> Probably it wasn't be nothing scary if I would have proper hardware to
>> process such file.
>> I've noticed that most of the tools for dump processing requires:
>> a) amount of memory at least equal to dump size
>> b) amount of free disk space at least equal to dump size (to create
>> indexes)
>> Unfortunately, I haven't access to machine which where both requirements
>> are meet at once.
> Yes I agree -  for a) above I'd say that its common to need 1.5 times the
> size of the original heap.
>> Processing I would like to perform is backtracking of references to
>> instances of one class (which causes out of memory). Assuming that hard disk
>> read will be my performance bottleneck, I should be able to backtrace few
>> levels during the night.
>> I have only raw overview of algorithm in my head, but it seems that
>> something like "visitor pattern" would be enough for me.
>> Since I have read about kato some time ago, I wanted to give it a try.
>> For dev purposes I have prepared ~370MB heap dump with around 10 000 000 of
>> simple object added to collection (which probably multiply amount of object
>> on heap).
>> Can you share the code you used to generate the data in the dumo?
>> 1) Default approach:
>> Image image = FactoryRegistry.getDefaultRegistry().getImage(dump);
>> I was waiting few minutes, but it didn't finish processing, so it looks
>> that there will be no chance to process 60GB dump.
> I suspect that the main reason why this is taking so long is that the HPROF
> reader has to read all the dump first since it doesn't know what questions
> you need answering,   That's generally true of any dump reader
> unfortunately.
> 2) HProfFile
>> Fast look at HProfView class, give me some idea how I can visit all objects
>> (records) on a heap.
>> I wrote simple app which only iterate through all records, but it also
>> turned out to be quite slow and memory consuming. Following is some metrics:
>> ---------------------------------
>> MemoryPool: PS Old Gen
>> Hello World!
>> heapDump:
>> org.apache.kato.hprof.datalayer.HProfFile$HeapDumpHProfRecord@f0eed6
>> HeapSubRecord: 100000 (946ms, 4199kB)
>> HeapSubRecord: 200000 (2064ms, 7955kB)
>> HeapSubRecord: 300000 (3123ms, 11759kB)
>> HeapSubRecord: 400000 (3933ms, 14811kB)
>> HeapSubRecord: 500000 (3908ms, 17927kB)
>> HeapSubRecord: 600000 (7269ms, 21039kB)
>> HeapSubRecord: 700000 (7736ms, 24139kB)
>> HeapSubRecord: 800000 (7866ms, 27147kB)
>> HeapSubRecord: 900000 (7753ms, 30263kB)
>> HeapSubRecord: 1000000 (7684ms, 33299kB)
>> HeapSubRecord: 1100000 (13515ms, 36487kB)
>> HeapSubRecord: 1200000 (15525ms, 39623kB)
>> HeapSubRecord: 1300000 (15405ms, 42723kB)
>> HeapSubRecord: 1400000 (15567ms, 39115kB)
>> HeapSubRecord: 1500000 (15459ms, 42203kB)
>> HeapSubRecord: 1600000 (15692ms, 43838kB)
>> HeapSubRecord: 1700000 (15424ms, 45926kB)
>> HeapSubRecord: 1800000 (15327ms, 49026kB)
>> HeapSubRecord: 1900000 (15416ms, 48505kB)
>> HeapSubRecord: 2000000 (15352ms, 51629kB)
>> -------------------------------
>> It means that iterating over first 100 000 of records took 946ms and 4199kB
>> of OldGen was consumed.
>> Iterating over next 100 000 of records took 2064ms and 7955kB of OldGen was
>> consumed.
>> And so on, 100 000 of records is the interval for printing stats.
>> One core of cpu was saturated. It also looks like required amount of memory
>> will be equal to dump size.
>> I could start 4 threads to make better utilization of CPU, but since it
>> looks like HProfFile instance is not thread safe I would have to create 4
>> instances of HProfFile, which means that required amount of memory will be
>> like 4 x dumpSize.
>> That's all I made so far. I didn't track what in HProfFile consumes CPU and
>> memory, my blind guess is that CachedRandomAccesDataProvider is involved.
>> Thanks for this Luksz -   you are probably the first person to use this
> code other than the developers and its great to get some feedback.  Can you
> share the  code you used to create the dump and to visit the HPROF records?
> Stuart has made some performance adjustments to the hprof code and we'll see
> if we can do better.
> On the spec list we're discussing the basics of a "snapshot" dump concept
> where only what you need gets dumped.    I wonder if the same idea could be
> applied to opening a dump.   It would be great to know when reading a dump
> that certain information is not required -  that should improve
> performance.
> Regards
>> Lukasz

Stuart Monteith

View raw message