From Lukasz <>
Subject Re: Processing huge heap dump.
Date Mon, 11 Jan 2010 20:59:44 GMT
Hi Stuart, Steve,

I've taken deeper look into code. I still didn't trace carefully index 
calculation in classes BitMaskMappingArray and 
ArrayBitMaskMappingStrategy, but I managed to improve performance by 
increasing arrays size in those classes (which is set in HProfFile class).

If I understand code correctly, when capacity of BitMaskMappingArray 
will be exhausted bucketSize is doubled, which in turn causes that more 
reads (even cached) is required to set position of 

Following are loading time results for default array size (1000) and 
increased (1000000). Test ran against generated dump file (5000000 
instances of Data).
Default (1000):
HeapSubRecord: 100000 (866ms, 4215kB)
HeapSubRecord: 200000 (1716ms, 7879kB)
HeapSubRecord: 300000 (2833ms, 11263kB)
HeapSubRecord: 400000 (3889ms, 14283kB)
HeapSubRecord: 500000 (3893ms, 17319kB)
HeapSubRecord: 600000 (7248ms, 20479kB) (here probably buckedSize was 
HeapSubRecord: 700000 (7721ms, 23531kB)
HeapSubRecord: 800000 (7729ms, 26567kB)
HeapSubRecord: 900000 (7731ms, 29671kB)
HeapSubRecord: 1000000 (7704ms, 32731kB)
... (I didn't wait until end)

HeapSubRecord: 100000 (622ms, 17809kB)
HeapSubRecord: 200000 (309ms, 20345kB)
HeapSubRecord: 300000 (283ms, 23861kB)
HeapSubRecord: 400000 (274ms, 27921kB)
HeapSubRecord: 500000 (269ms, 29957kB)
HeapSubRecord: 600000 (264ms, 31993kB)
HeapSubRecord: 700000 (272ms, 36097kB)
HeapSubRecord: 800000 (288ms, 37739kB)
HeapSubRecord: 900000 (263ms, 39835kB)
HeapSubRecord: 1000000 (259ms, 41931kB)
HeapSubRecord: 1100000 (300ms, 44773kB)
HeapSubRecord: 1200000 (283ms, 46901kB)
HeapSubRecord: 1300000 (291ms, 49029kB)
HeapSubRecord: 1400000 (328ms, 53801kB)
HeapSubRecord: 1500000 (259ms, 53801kB)
HeapSubRecord: 1600000 (272ms, 58125kB)
HeapSubRecord: 1700000 (264ms, 60293kB)
HeapSubRecord: 1800000 (264ms, 62473kB)
HeapSubRecord: 1900000 (361ms, 61373kB)
HeapSubRecord: 2000000 (274ms, 63105kB)
HeapSubRecord: 9000000 (284ms, 231969kB)
HeapSubRecord: 9100000 (272ms, 233597kB)
HeapSubRecord: 9200000 (281ms, 236357kB)
HeapSubRecord: 9300000 (274ms, 240469kB)
HeapSubRecord: 9400000 (279ms, 244541kB)
HeapSubRecord: 9500000 (269ms, 246549kB)
HeapSubRecord: 9600000 (279ms, 250565kB)
HeapSubRecord: 9700000 (265ms, 252573kB)
HeapSubRecord: 9800000 (279ms, 256629kB)
HeapSubRecord: 9900000 (265ms, 258669kB)
HeapSubRecord: 10000000 (463ms, 263997kB)

i.e. my 60GB dump file contains more than 1 100 000 000 of objects (if I 
remember correctly).


Stuart Monteith wrote:
> The hprof dump reader spends a lot of time reading the whole file, for 
> various reason.
> The indices it has in memory are constructed through an initial read, 
> and this is also
> the source of the memory usage. In addition, there is some correlation 
> to be done which
> also takes up time, and induces yet more reading.
> I'm sure some work could be done to improve the performance further, 
> but we'll have to
> look at the tradeoff between diskspace and memory usage. The hprof 
> file format itself
> is what it is, however, and we have no influence over that. The CJVMTI 
> agent is has lots of
> room for improvement, but I suspect its potential for improvement is 
> unlikely to be much better
> than existing hprof implementations. The built-in JVM hprof dumper 
> will probably be a hard act
> to follow.
> The HProf implementation is not thread-safe. Realistically, I think it 
> is something that ought to
> be considered once things are more mature. There will be algorithms 
> that can deal with the JVM
> structure sensible.
> And thanks Lukasz, it's great to have your input.
> Regards,
>     Stuart

