harmony-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Tim Ellison <t.p.elli...@gmail.com>
Subject Re: [classlib] HashMap optimization (again)
Date Wed, 16 Jan 2008 17:31:04 GMT
Aleksey Shipilev wrote:
> Hi again, Tim.
> 
> So I spent another day for this issue. I've gathered the profile of
> SPECjbb2005 and grepped out HashMap methods (okay, I had to disable
> inline, so exact numbers differ from actual performance run):

Thanks for spending time to look into the issue Aleksey, it is much 
appreciated.

> Current implementation:
> 
> 6.99% HashMap.findNonNullKeyEntry(Ljava/lang/Object;II)Ljava/util/HashMap$Entry;
> 0.61% HashMap.getEntry(Ljava/lang/Object;)Ljava/util/HashMap$Entry;
> 0.25% HashMap.get(Ljava/lang/Object;)Ljava/lang/Object;	
> ---------------
> 7.86% Total
> 			
> H5374:			
> 
> 6.01% HashMap.findNonNullKeyEntryInteger(Ljava/lang/Object;II)Ljava/util/HashMap$Entry;
> 0.67% HashMap.findNonNullKeyEntryLegacy(Ljava/lang/Object;II)Ljava/util/HashMap$Entry;
> 0.61% HashMap.getEntry(Ljava/lang/Object;)Ljava/util/HashMap$Entry;
> 0.42% HashMap.get(Ljava/lang/Object;)Ljava/lang/Object;	
> 0.39% HashMap.findNonNullKeyEntry(Ljava/lang/Object;II)Ljava/util/HashMap$Entry;
> ---------------
> 7.05% Total
> 
> Percents are clocktick percents of entire workload.
> So, profile shows that H5374 code is actually faster.
> 
> Then after talk with Sergey Kuksenko (that's a credit to him :)) I
> tried to compare these two implementations without allocPrefetch,
> which prefetches the memory for newly created objects and thus
> inferring high cache pressure. allocPrefetch itself gives hu-u-uge
> boosts, but can expose cache limitations for other optimizations. So,
> with allocPrefetch disabled:
> 
> Windows x86
> 100.0% Harmony-clean
> 101.1% Harmony + H5374
> 
> Windows x86_64
> 100.0% Harmony-clean
> 100.5% Harmony + H5374
> 
> That's the boost I'm looking for! I wonder why such positive change as
> manual unboxing changes L2 cache access patterns so it gives boosts in
> normal mode and degradation in presence of high L2 cache user.
> 
> I had also remeasured all modes accurately, so let's have the
> conclusion on this issue:
> 
> Windows x86:
>  100.0% [base] Harmony-clean
>  100.2% [+0.2%] Harmony-clean + H5374
>  88.6%   [base] Harmony-clean - allocPrefetch
>  89.6%   [+1%] Harmony-clean - allocPrefetch + H5374
> 
> Windows x86_64:
>  100.0% [base] Harmony-clean
>  100.1% [+0.1%] Harmony-clean + H5374
>  88.9%   [base] Harmony-clean - allocPrefetch
>  89.3%   [+0.5%] Harmony-clean - allocPrefetch + H5374
> 
> ...measurement uncertainty is about 0.4%.
> 
> Basing on this data I would say this patch couldn't get much boost on
> DRLVM, since DRLVM's optimizations do their job of scalarization just
> fine. The patch should also increase cache locality and it seems to be
> the case in absence of another L2 cache contributor. Let's add that
> such specialization bloats code a little, and jump to conclusion that
> from DRLVM side it would be better to keep patch out of trunk.

Fair enough (though it looks like a minor improvement, right?).
I'm happy to leave the patch out.

Can I go back a moment to hear about the scalar replacement technique in 
Jitrino?  Feel free to point me to some doc or code if that is easier.

As you know, my goal was to avoid the key dereferencing when searching 
the hashmap by, as you say, unboxing the Integer and encoding the value 
in the hashcode int field.  The key field is still an object ptr to the 
original Integer object which is required for answering the keySet etc.

So how does Jitrino both unbox the primitive and preserve the 'box' for 
when it must be returned? [If you see what I mean, otherwise I'll try 
and rephrase it]

> There is one more possible opportunity - to tune up prefetch distance
> in allocPrefetch, but that's a fragile thing to optimize.

Yeah, but no need to perform unnatural acts.  We can leave it out if 
there is no benefit to Harmony.

Regards,
Tim

Mime
View raw message