Return-Path: Delivered-To: apmail-harmony-dev-archive@www.apache.org Received: (qmail 14634 invoked from network); 29 Jun 2007 14:14:10 -0000 Received: from hermes.apache.org (HELO mail.apache.org) (140.211.11.2) by minotaur.apache.org with SMTP; 29 Jun 2007 14:14:10 -0000 Received: (qmail 11074 invoked by uid 500); 29 Jun 2007 14:14:02 -0000 Delivered-To: apmail-harmony-dev-archive@harmony.apache.org Received: (qmail 11042 invoked by uid 500); 29 Jun 2007 14:14:02 -0000 Mailing-List: contact dev-help@harmony.apache.org; run by ezmlm Precedence: bulk List-Help: List-Unsubscribe: List-Post: List-Id: Reply-To: dev@harmony.apache.org Delivered-To: mailing list dev@harmony.apache.org Received: (qmail 11014 invoked by uid 99); 29 Jun 2007 14:14:02 -0000 Received: from herse.apache.org (HELO herse.apache.org) (140.211.11.133) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Jun 2007 07:14:02 -0700 X-ASF-Spam-Status: No, hits=-0.0 required=10.0 tests=SPF_PASS X-Spam-Check-By: apache.org Received-SPF: pass (herse.apache.org: domain of t.p.ellison@gmail.com designates 66.249.90.176 as permitted sender) Received: from [66.249.90.176] (HELO ik-out-1112.google.com) (66.249.90.176) by apache.org (qpsmtpd/0.29) with ESMTP; Fri, 29 Jun 2007 07:13:58 -0700 Received: by ik-out-1112.google.com with SMTP id c30so821221ika for ; Fri, 29 Jun 2007 07:13:36 -0700 (PDT) DKIM-Signature: a=rsa-sha1; c=relaxed/relaxed; d=gmail.com; s=beta; h=domainkey-signature:received:received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:x-enigmail-version:content-type:content-transfer-encoding; b=clUhjrWtZ4sGZD32BA34nat+g6Hs7OonOasROH9gvbR0WcAmmEKQ8D08MEBkMZBBdb8fmubTXCdcPPnYDAYBMBBw9jZ2+z8PgErPFnEDshnDI69gkfjs1lVS9walskINR6Qhv4Ks0KVclJ5Kl5reb6K0A21ESjiTwQBQgAXDYAo= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=beta; h=received:message-id:date:from:user-agent:mime-version:to:subject:references:in-reply-to:x-enigmail-version:content-type:content-transfer-encoding; b=WN+YYt0WCCK8/OEUiTRrLVAXnK329mQ6hMJZprWG0vyFCUEBVL8JSqYpUfJoUnjT4H2LEbsRTJvxSuWd6UHZw6zpQyd+NAckarfJKrC9CKDetL2oGwHNUSRZ/aSlonXz9pm62kxqFFqZEmIJM12Dd//A6uFbBzn4BL0940hh2ZE= Received: by 10.78.137.7 with SMTP id k7mr1623277hud.1183126416247; Fri, 29 Jun 2007 07:13:36 -0700 (PDT) Received: from ?192.168.0.2? ( [86.111.176.100]) by mx.google.com with ESMTP id i4sm12199011nfh.2007.06.29.07.13.35 (version=SSLv3 cipher=RC4-MD5); Fri, 29 Jun 2007 07:13:35 -0700 (PDT) Message-ID: <46851389.10603@gmail.com> Date: Fri, 29 Jun 2007 15:13:29 +0100 From: Tim Ellison User-Agent: Thunderbird 2.0.0.4 (Windows/20070604) MIME-Version: 1.0 To: dev@harmony.apache.org Subject: Re: Performance improvement of java.util.HashMap References: <5c8e69f0706270310k38a93b99oa84e92d7d54220c4@mail.gmail.com> <253b20230706290634x12740163h6f18dedadf7dbc7c@mail.gmail.com> In-Reply-To: <253b20230706290634x12740163h6f18dedadf7dbc7c@mail.gmail.com> X-Enigmail-Version: 0.95.1 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-Virus-Checked: Checked by ClamAV on apache.org Sergey Kuksenko wrote: > I wish to include myself into discussion and express some my thought > in this area. Great -- the more input the better! > I am absolutely agree with the idea of using bitwise operations & (or > power of two remainder) instead of % (arbirary remainder) because of > power of two remainder is really faster and was proved by my tests > and etc... > > First of all I wish to note that if HashMap works with user-defined > HashCode it doesn't matter which reminder is used: power fo two (&) > or prime(or arbitrary) number reminder (%). In case if user defined > HashCode we can't predict hash and the hash can be suitable as for & > and for %. Right, I think we pretty much have to ignore the user defined hashCode() since we have no insight into its characteristics, other than the obvious ones like it being a 32-bit value. > But prime number reminder has one disadavntages (except speed of > operation). When HashMap uses prime number (or some irreglar) we get > negative effect of unused part of memory page. Size of memory pages > is power of two and it is better to fill it completely. Espesially it > is visible on large hash maps. Unfortunately I can privide any > numbers at the moment. Thus in my opinion power of two reminder gives > boost in time of operation plus benefits from more natural memory > usage (layouts etc.) Good point, for the memory page occupancy of the contiguous 'elementData' array of map entries. In a somewhat related thought, it may be worth experimenting with pre-allocating a number of Entries, to get better memory locality for the whole hash map, but I digress... > As to default hashCode which is IdentityHashCode. I don't know what object types are the most common for hashmap keys. For example, if it turns out that keys are usually Strings then the hashCode() values' characteristics would be different. But I agree that we should do well for identityHashCode values since that probably represents the biggest set of key types we want to accommodate. > There are two ways here: > 1. Add new hashing function (additional hashing) inside HashMap. As was > suggested here. > 2. Change IdentityHashCode implementation. There are no rules that > IdentityHashCode should return exactly address of object. And last > zero bits from IdentityHashCode (which caused by object alignment) > can be removed inside IdentityHashCode. Other words - to add > additional hashing inside IdentityHashCode. There is no rule that identityHashCode should be the address, you are right - but I think that is a common implementation. We would like the class library code to work well on multiple-VM implementations. > The first solution has one advantage - it can be done only in > classlib. The second variant should be implemented in all VMs which > are based on Harmony's classlib. But this way has other advantages - > Well-formed IdentityHashCode will helps not only for HashMap, but for > Hashtable, IdentityHashMap Well we could use whatever technique we settle on in all our hashed collections > and moreover for all hashing implemented by users. how so? unless they query the object identityhashCode, which I think is not usually the case when overriding hashCode(). > In the second variant we don't add rehashing function into HashMap. > It is common practice when advanced Java developers tune their > HashCode functions. If we add rehashing function into HashMap we may > break their tuning, without knowledge of internal rehashing functions > it is impossible to tune HashCode function. I agree that only applying a very simple transformation would be appropriate so as not to 'break' users' hash functions. Plus it has the advantage of not taking a long time in our code. That's why I had suggested a simple rotate right operation (to help with the common implementation of identityHashCode situation). > From my opinion we should think about the last argument against adding > rehashing into HashMap and move rehashing into VM. So you would recommend using the key.hashCode() unmodified? Regards, Tim