db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Knut Anders Hatlen <Knut.Hat...@Sun.COM>
Subject Re: Some questions about FormatableBitSet
Date Tue, 12 Dec 2006 14:22:31 GMT
Dyre.Tjeldvoll@Sun.COM writes:

> There are a couple of things in the implementation of this class that
> I just don't understand:
> - The doc for the method grow() says:
>   "ASSUMPTIONS: that all extra bits in the last byte are zero". But in
>   other methods (or() and()...) great care is taken mask out the bits
>   of the last byte that isn't part of the bitset. But if it is an
>   invariant of the class that these unused bits are zero, it should be
>   possible to do most operations byte by byte with no special handling
>   of the last byte, right?

That seems like a safe assumption to me. I read through the class and
couldn't find any way to put non-zero values into the unused bits. In
particular, grow(), shrink(), concatenate(), set() and the
constructors ensure that extra bits are zero, so that should make it
safe to count bits and perform and/or/xor byte by byte.

Oh, by the way, I found one way to put garbage into the unused
bits. getByteArray() returns the internal byte array (not a copy), so
it might be manipulated directly. But if someone did that, they would
have to expect unpredictable behaviour. For instance, the hashCode()
method would not handle that correctly.

> - in or() (but not in and() or xor()) an explicit test is done on
>   whether the argument of type FormatableBitSet is "instanceof
>   FormatableBitSet". Will it not always be? It could have been a
>   subclass I suppose, but since FormatableBitSet is final that becomes
>   a bit hypothetical.

Well, final or not, a subclass will still be an instance of
FormatableBitSet, and since null is handled earlier, the code in the
else clause can never be reached. In my opinion, removing the if/else
would greatly improve the readability of that method.

> - A minor nit: This class appears to do a lot of division and
>   remainder calculations with powers of 2 (frequently 8). Is it not
>   generally preferably to do this with shifts and  bitwise ANDs?

For the divisions, I agree with Dan that we should expect that the
compiler or the JVM is able to rewrite them to shift operations. For
the remainder calculations, (pos & 7) could in theory be faster than
(pos % 8) since it cannot be optimized automatically (because of the
different handling of negative values). However, I wouldn't expect a
noticeable difference...

> - The algorithm for finding the number of bits set seems a bit
>   simple (iterating over all bits, and counting those for which isSet()
>   returns true). A simple google search will give a number of ways of
>   calculating the number of bits in a byte quickly. Is there a reason
>   not to use such an algorithm and iterate over the bytes, rather than
>   the bits?

I would guess it was written this way because it was the easiest way
to implement it. If it had been written for a newer JDK (1.5 and
higher), it would have been just as easy (and more efficient) to use
Integer.bitCount() and count byte by byte. At least for large bit
sets, I would expect the algorithms you have found to be more
efficient than the current bit by bit algorithm.

Knut Anders

View raw message