commons-user mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Story <henry.st...@bblfish.net>
Subject Re: An improvement to AbstractHashedMap, and a new HashedSet class
Date Wed, 11 Feb 2004 00:16:21 GMT
I am using the collections library from CVS and did not have a problem 
with the minor re-factoring. After a few quick cosmetic changes, 
everything works fine as far as I can tell.

I have placed a couple of UML diagrams on my web site with the code to 
help explain what I am doing. You can find the files at: 
http://bblfish.net/java/collections/

The collections.jpg picture is a quick sketch of the current 
collections.map class hierarchy.  As you can see AbstractHashedMap 
contains a collection of HashEntry objects. Each of these have a key 
and a value. Various subclasses of AbstractHashedMap implement 
different specialisations of hashes, some of which require 
specialisations of HashedEntry (eg IdentityMap requires an 
IdentityEntry).
This diagram does not show the inner class relationships. For example 
HashedEntry is an inner class of AbstractHashedMap.

What I am proposing is really only a very small change. (Even if the 
next diagram looks very different)

If you looks at collections-altered.jpg you will see that the only 
novelty is that HashEntry is now an interface, with most of the code 
that it used to contain now in DefaultHashEntry. I have shown the 
packages nested inside one another to make clear the static inner 
class/interface relationships. As you see DefaultHashEntry has three 
fields, but ReflexiveHashEntry only has two, since it is a data 
structure that is used for a set where they map.key = map.value.

The only change needed is the interface change - making HashEntry an 
interface.
This allows us to bypass the default implementation of HashEntry.

I just added the HashedSet as an example of how it could come in useful.

On 10 Feb 2004, at 18:57, Stephen Colebourne wrote:

> Unfortunately, as version 3 of [collections] has now been released with
> HashEntry as a class, that cannot be changed to an interface.
>
> I would however, support the addition of a similar AbstractHashedSet 
> class
> to [collections], preferably with a default HashedSet implementation.
>
> Stephen
>
>
> ----- Original Message -----
> From: "Henry Story" <henry.story@bblfish.net>
>
>
>> Hi,
>>
>> A little background to this contribution is in order. I was looking at
>> an interesting framework for Value Objects written by Dirk Riehle,
>> available under the lgpl at http://www.jvalue.org/, when I found a
>> small bug in his code. It was obvious that what was needed was a
>> WeakHashedSet - a HashSet that would wrap its contents in
>> WeakReference's. Given the difficulty of extending java.util classes I
>> opted to look at the apache collections package instead.
>>
>> I first decided to create a HashedSet class, an equivalent of
>> java.net.HashSet. The best way to do this I realised, would be to
>> extend the AbstractHashedMap. The JavaDoc for this class says:
>>
>>> This class implements all the features necessary for a subclass
>> hash-based map.
>>> Key-value entries are stored in instances of the HashEntry class,
>> which can be
>>> overridden and replaced. The iterators can similarly be replaced,
>> without the
>>> need to replace the KeySet, EntrySet and Values view classes.
>>
>> But in fact it is not quite as extensible as it could be because the
>> HashEntry is currently a class, not an interface. Let me explain by
>> taking the simpler example of rewriting the java.util.HashSet class.
>>
>> HashSet
>>      map -> HashMap
>>               entries -> HashEntries[]
>>                       key -> Object
>>                            value -> Object
>>                       next -> HashEntry
>>
>> This picture show that HashSet contains a pointer to a HashMap, which
>> contains a pointer to an Array of HashEntries. Each HashEntry contains
>> a key pointer, a value pointer and a pointer to the next HashEntry.
>>
>> A HashSet uses the HashMap because the keys in a HashMap are unique. 
>> As
>> a result the value pointer in the HashEntry has really no role to 
>> play.
>> The way AbstractHashedMap is written in the apache collection means
>> that to simply follow the above pattern would waste one pointer for
>> every HashEntry, that is 4 bytes wasted for every entry.
>>
>> The solution is simple.
>>   1. I turned the HashEntry inner class into an inner interface of
>> AbstractHashedMap.
>>   2. I added a few methods to get/set the key and the value. These
>> methods should be inlined by all modern compilers, so I don't think
>> there is any efficiency problem here.
>>   3. I then wrote a HashedSet which has an inner class that extends
>> AbstractHashedMap, but which takes a modified HashEntry class where 
>> the
>> key and the value are the same object. The resulting
>> net.bblfish.util.HashedSet is thus more memory efficient than
>> java.util.HashSet.
>>
>> HashedSet
>>      map -> SpecialisedHashMap
>>               entries -> HashEntries[]
>>                       key -> Object
>>                            next -> HashEntry
>>
>>
>> Perhaps it would be worth adding these changes to the library.
>>
>> If the change to
>> org.apache.commons.collections.map.AbstractHashedMap.HashEntry is
>> accepted then it will make it relatively easy to write a 
>> WeakHashedSet.
>> It will be just a matter of extending WeakReference by implementing 
>> the
>> HashEntry methods.
>>
>> WeakHashedSet
>>      map -> SpeacialWeakHashMap
>>               entries -> WeakHashEntries[] extends WeakReference
>>                                            implements HashEntry
>>                       key -> Object
>>                            next -> HashEntry
>>
>>
>> Henry


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Mime
View raw message