commons-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Henry Story <henry.st...@bblfish.net>
Subject An improvement to AbstractHashedMap, and a new HashedSet class
Date Mon, 09 Feb 2004 21:50:24 GMT
Hi,

A little background to this contribution is in order. I was looking at 
an interesting framework for Value Objects written by Dirk Riehle, 
available under the lgpl at http://www.jvalue.org/, when I found a 
small bug in his code. It was obvious that what was needed was a 
WeakHashedSet - a HashSet that would wrap its contents in 
WeakReference's. Given the difficulty of extending java.util classes I 
opted to look at the apache collections package instead.

I first decided to create a HashedSet class, an equivalent of 
java.net.HashSet. The best way to do this I realised, would be to 
extend the AbstractHashedMap. The JavaDoc for this class says:

 > This class implements all the features necessary for a subclass 
hash-based map.
 > Key-value entries are stored in instances of the HashEntry class,  
which can be
 > overridden and replaced. The iterators can similarly be replaced,  
without the
 > need to replace the KeySet, EntrySet and Values view classes.

But in fact it is not quite as extensible as it could be because the 
HashEntry is currently a class, not an interface. Let me explain by 
taking the simpler example of rewriting the java.util.HashSet class.

HashSet
     map -> HashMap
              entries -> HashEntries[]
	                      key -> Object
                           value -> Object
	                      next -> HashEntry

This picture show that HashSet contains a pointer to a HashMap, which 
contains a pointer to an Array of HashEntries. Each HashEntry contains 
a key pointer, a value pointer and a pointer to the next HashEntry.

A HashSet uses the HashMap because the keys in a HashMap are unique. As 
a result the value pointer in the HashEntry has really no role to play. 
The way AbstractHashedMap is written in the apache collection means 
that to simply follow the above pattern would waste one pointer for 
every HashEntry, that is 4 bytes wasted for every entry.

The solution is simple.
  1. I turned the HashEntry inner class into an inner interface of 
AbstractHashedMap.
  2. I added a few methods to get/set the key and the value. These 
methods should be inlined by all modern compilers, so I don't think 
there is any efficiency problem here.
  3. I then wrote a HashedSet which has an inner class that extends 
AbstractHashedMap, but which takes a modified HashEntry class where the 
key and the value are the same object. The resulting 
net.bblfish.util.HashedSet is thus more memory efficient than 
java.util.HashSet.

HashedSet
     map -> SpecialisedHashMap
              entries -> HashEntries[]
	                      key -> Object
                           next -> HashEntry


Perhaps it would be worth adding these changes to the library.

If the change to 
org.apache.commons.collections.map.AbstractHashedMap.HashEntry is 
accepted then it will make it relatively easy to write a WeakHashedSet. 
It will be just a matter of extending WeakReference by implementing the 
HashEntry methods.

WeakHashedSet
     map -> SpeacialWeakHashMap
              entries -> WeakHashEntries[] extends WeakReference
                                           implements HashEntry
	                      key -> Object
                           next -> HashEntry


Henry

I have placed the changes in a tar file at:
http://bblfish.net/tmp/changed.tar


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-dev-help@jakarta.apache.org


Mime
View raw message