db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Klebanoff <kleba...@Mutagen.Net>
Subject BackingStoreHashtable
Date Fri, 10 Dec 2004 03:46:02 GMT
Derby uses class BackingStoreHashtable to implement a hash table used in 
hash joins. Despite the class's name it does not spill to disk; the hash 
table is always in memory.

The optimizer tries to avoid hash joins when the input is large because 
BackingStoreHashtable would use too much memory. The optimizer just 
works with row count estimates, so there is no guarantee that 
BackingStoreHashtable will not blow up. No customer has complained about 
this yet.

I would like to work on this, changing BackingStoreHashtable to spill to 
disk when the hash table gets large, and changing the optimizer to 
understand that large hash tables are allowed, but more costly.

The advantages of doing this are that
1. hash joins will not blow up when the compiler's row count estimate 
was low,
2. the optmizer can choose hash joins on large inputs when that is the 
least costly join implementation, and
3. we can use BackingStoreHashtable to implement features such as 
INTERSECT and GROUP BY, though I am not proposing to do so now.

I am not proposing to implement a hash table that can be used as an 
alternative to Btrees for a secondary index. BackingStoreHashtable is 
used for transient data structures that are only accessed by a single 
thread. A secondary index implementation must deal with locking and must 
implement hashCode methods that are JVM independent. This is much more 
work and would yield a slower implementation.

I propose that BackingStoreHashtable should start off using an in-memory 
hash table even if the estimated row count is large. That way it will 
use an in-memory hash table when the actual row count is small enough 
for the table to fit in memory. When it finds that spilling to disk is 
necessary BackingStoreHashtable will use the estimated row count to 
determine the initial number of buckets and move the in-memory entries 
to disk. The disk based hash table will use a linear hashing algorithm, 
see "External Memory Algorithms and Data Structures: Dealing withMassive 
Data", Jeffrey Scott Vitter, ACM Computing Surveys, Vol. 33, No. 2, June 
2001, pp. 209–271. It grows the hash table one bucket at a time when the 
average number of entries per bucket grows beyond a threshold. The 
disadvantage of growing by a larger number of buckets is that the last 
expansion may be unnecessarity large, wasting time and space.

I would appreciate it if anyone can point me to a better external hash 
table algorithm.

The disk hash table will use overflow pages because an imperfect hash 
function will cause some buckets to get more than their share of 
entries, and because we may have duplicate keys.

I have not yet looked at how the optimizer handles hash joins, so I do 
not yet have a proposal for how to change that.

Can anyone explain how to generate a cost for the Derby optimizer? How 
do you compute it from an estimated number of disk accesses? Does an 
estimate of 2 disk accesses mean a cost of 2.0? Should the cost estimate 
include CPU time? If so, how do you estimate it relative to disk access 
cost? Putting it another way, what does a cost estimate of 1.0 (or x) mean?

Comments, criticisms, questions?

Jack Klebanoff

Mime
View raw message