db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: BackingStoreHashtable
Date Fri, 10 Dec 2004 18:10:15 GMT
Hash: SHA1

Some notes on costing information:

Costing information was developed when most cloudscape databases were
relatively small and query plan options were not very extensive.  The
cost numbers currently are just relative numbers and don't represent
any real thing like elapsed time or cpu time - they are just meant to be
compared against each other.  The
actual numbers are from running a set of internal performance tests on
a single machine a number of years ago, comparing the relative
performance of the operations for which store return cost information
back to it's clients through the SortCostController, StoreCostController,
and StoreCostResult interfaces in the
opensource/java/engine/org/apache/derby/iapi/store/access directory.

The actual tests have not been open sourced yet as they were implemented
as part of the "unit test" structure.  These are the next set of tests
being open sourced and myrna is working as we speak on getting these
through legal so they can be posted.  The tests basically just create a
heap/btree table and measure probes and scans.  They likely underreport
I/O cost
which was not such a problem in the past with small databases, and they
seemed to give correct relative weight such that good plans were picked.

Your proposal would make it important to cost I/O's more realistically,
as the optimizer now would be choosing between plans which might
generate drastically more I/O.  So it would seem it is time to update
the costing information.  This was due anyway as the numbers have not
been run for recent releases and performance dynamics may have changed.

I have worked on systems which instead just assumed an elapsed time for
an I/O and then added that to a estimated cpu cost for the operation.
Another idea that was discussed at clouscape was to somehow come up
with estimates that matched the machine you were running on, such that
the estimates were actually attempting to be real time numbers on
the machine that the server was running on at that time.  Never came up
with best way to do this.  If the units of the costing are changed, we
should come up with some way such that in the future if derby supports
some sort of external data source, that data source can provide the
Jack Klebanoff wrote:
| Derby uses class BackingStoreHashtable to implement a hash table used in
| hash joins. Despite the class's name it does not spill to disk; the hash
| table is always in memory.
| The optimizer tries to avoid hash joins when the input is large because
| BackingStoreHashtable would use too much memory. The optimizer just
| works with row count estimates, so there is no guarantee that
| BackingStoreHashtable will not blow up. No customer has complained about
| this yet.
| I would like to work on this, changing BackingStoreHashtable to spill to
| disk when the hash table gets large, and changing the optimizer to
| understand that large hash tables are allowed, but more costly.
| The advantages of doing this are that
| 1. hash joins will not blow up when the compiler's row count estimate
| was low,
| 2. the optmizer can choose hash joins on large inputs when that is the
| least costly join implementation, and
| 3. we can use BackingStoreHashtable to implement features such as
| INTERSECT and GROUP BY, though I am not proposing to do so now.
| I am not proposing to implement a hash table that can be used as an
| alternative to Btrees for a secondary index. BackingStoreHashtable is
| used for transient data structures that are only accessed by a single
| thread. A secondary index implementation must deal with locking and must
| implement hashCode methods that are JVM independent. This is much more
| work and would yield a slower implementation.
| I propose that BackingStoreHashtable should start off using an in-memory
| hash table even if the estimated row count is large. That way it will
| use an in-memory hash table when the actual row count is small enough
| for the table to fit in memory. When it finds that spilling to disk is
| necessary BackingStoreHashtable will use the estimated row count to
| determine the initial number of buckets and move the in-memory entries
| to disk. The disk based hash table will use a linear hashing algorithm,
| see "External Memory Algorithms and Data Structures: Dealing withMassive
| Data", Jeffrey Scott Vitter, ACM Computing Surveys, Vol. 33, No. 2, June
| 2001, pp. 209–271. It grows the hash table one bucket at a time when the
| average number of entries per bucket grows beyond a threshold. The
| disadvantage of growing by a larger number of buckets is that the last
| expansion may be unnecessarity large, wasting time and space.
| I would appreciate it if anyone can point me to a better external hash
| table algorithm.
| The disk hash table will use overflow pages because an imperfect hash
| function will cause some buckets to get more than their share of
| entries, and because we may have duplicate keys.
| I have not yet looked at how the optimizer handles hash joins, so I do
| not yet have a proposal for how to change that.
| Can anyone explain how to generate a cost for the Derby optimizer? How
| do you compute it from an estimated number of disk accesses? Does an
| estimate of 2 disk accesses mean a cost of 2.0? Should the cost estimate
| include CPU time? If so, how do you estimate it relative to disk access
| cost? Putting it another way, what does a cost estimate of 1.0 (or x)
| Comments, criticisms, questions?
| Jack Klebanoff
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org


View raw message