db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Jack Klebanoff <klebanoff-de...@sbcglobal.net>
Subject Re: BackingStoreHashtable
Date Thu, 27 Jan 2005 18:32:02 GMT
My mail server dropped RPost's reply so I had to dig it out of the 
archive. I apologize for the delay and improper threading.

RPost wrote:

>Does either 'spill to disk' approach have any other possible future use?
>Perhaps for implementing any other features or functionality on anyone's
>'nice to have' list? Or would either btree or hash implementations be useful
>for only this one purpose?
BackingStoreHashtable could be used to implement other operations such 
as INTERSECT, and EXCEPT, and in removing duplicates. I originally 
noticed that BackingStoreHashtable did not spill to disk when 
researching the implementation of INTERSECT.

A dynamic disk hash table could be used for indexes. However 
BackingStoreHashtables are transient, single thread, data structures. A 
hash index must deal with multi-thread issues while 
BackingStoreHashtable would not want to spend time on locking. 
Furthermore BackingStoreHashtables only grow until they are discarded 
entirely. A hash index must deal with shrinking. So, with respect to 
BackingStoreHashtable and hash indexes, I would say close, but no cigar.


> Jack Klebanoff wrote:
>> Derby uses class BackingStoreHashtable to implement a hash table used 
>> in hash joins. Despite the class's name it does not spill to disk; 
>> the hash table is always in memory.
>> The optimizer tries to avoid hash joins when the input is large 
>> because BackingStoreHashtable would use too much memory. The 
>> optimizer just works with row count estimates, so there is no 
>> guarantee that BackingStoreHashtable will not blow up. No customer 
>> has complained about this yet.
>> I would like to work on this, changing BackingStoreHashtable to spill 
>> to disk when the hash table gets large, and changing the optimizer to 
>> understand that large hash tables are allowed, but more costly.
>> The advantages of doing this are that
>> 1. hash joins will not blow up when the compiler's row count estimate 
>> was low,
>> 2. the optmizer can choose hash joins on large inputs when that is 
>> the least costly join implementation, and
>> 3. we can use BackingStoreHashtable to implement features such as 
>> INTERSECT and GROUP BY, though I am not proposing to do so now.
>> I am not proposing to implement a hash table that can be used as an 
>> alternative to Btrees for a secondary index. BackingStoreHashtable is 
>> used for transient data structures that are only accessed by a single 
>> thread. A secondary index implementation must deal with locking and 
>> must implement hashCode methods that are JVM independent. This is 
>> much more work and would yield a slower implementation.
>> I propose that BackingStoreHashtable should start off using an 
>> in-memory hash table even if the estimated row count is large. That 
>> way it will use an in-memory hash table when the actual row count is 
>> small enough for the table to fit in memory. When it finds that 
>> spilling to disk is necessary BackingStoreHashtable will use the 
>> estimated row count to determine the initial number of buckets and 
>> move the in-memory entries to disk. The disk based hash table will 
>> use a linear hashing algorithm, see "External Memory Algorithms and 
>> Data Structures: Dealing withMassive Data", Jeffrey Scott Vitter, ACM 
>> Computing Surveys, Vol. 33, No. 2, June 2001, pp. 209–271. It grows 
>> the hash table one bucket at a time when the average number of 
>> entries per bucket grows beyond a threshold. The disadvantage of 
>> growing by a larger number of buckets is that the last expansion may 
>> be unnecessarity large, wasting time and space.
>> I would appreciate it if anyone can point me to a better external 
>> hash table algorithm.
>> The disk hash table will use overflow pages because an imperfect hash 
>> function will cause some buckets to get more than their share of 
>> entries, and because we may have duplicate keys.
>> I have not yet looked at how the optimizer handles hash joins, so I 
>> do not yet have a proposal for how to change that.
>> Can anyone explain how to generate a cost for the Derby optimizer? 
>> How do you compute it from an estimated number of disk accesses? Does 
>> an estimate of 2 disk accesses mean a cost of 2.0? Should the cost 
>> estimate include CPU time? If so, how do you estimate it relative to 
>> disk access cost? Putting it another way, what does a cost estimate 
>> of 1.0 (or x) mean?
>> Comments, criticisms, questions?
>> Jack Klebanoff

View raw message