db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: BackingStoreHashtable
Date Fri, 10 Dec 2004 18:30:20 GMT
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On using large amounts of memory for queries:

I actually believe that the way derby optimizer/language uses hash joins
is optimal until we can get a better handle on memory allocation to
individual dbms clients within the dbms server.  The optimizer trys to
limit the memory allocation for a single query to a "reasonable" amount
and estimates if it can perform a hash join with all the rows in memory.
If it can it chooses hash join and currently always puts all the rows
in memory for the hash join.  If it runs out memory then the query
fails, just like if it runs out of memory while allocating any other
structure for any other query execution.  The system does a similar
thing for sorting, it has an in memory limit and if it estimates it
can't fit in memory it picks an alternative which uses less memory.  In
memory is obviously faster.

The problem with using unbounded memory is that one client in a 100
client system can use up most of the memory and cause the rest of the 99
clients to fail even their memory requirements are small.

For a zero admin database cloudscape choose a conservative approach to
memory usage in most decisions, including hash join and sorting.

I do agree that it would be better if we could recover from out of
memory conditions.  Unfortunately java interfaces are fairly lacking
in this area.  Once you run out memory it is too late to do much as it
it will wreak havoc in many execution paths of a multi-threaded server.

If you are interested in working on completing the implementation of
BackingStoreHashTable, of course it would be better for it to work as
documented - cloudscape just never had the resources to complete it as
it was always too low a priority, given that the optimizer never called
it expecting the overflow to be important.  It was one of those cases
where an interface was provided for a future need, but never implemented.

I do think more discussion is needed before changes are made to increase
the size of hash joins to the point where overflow is necessary.
Currently I see overflow as an edge case where the optimizer was wrong,
it would be better to execute the query slow than to fail.  Again as you
noted I can't remember an application problem being reported because of
the BackingStoreHashTable issue.

Have you considered implementing intersect similar to how hash join is
used currently in derby.  Have the optimizer pick hash join to implement
intersect if number of rows is small and your new sort merge if number
of rows is large?



Jack Klebanoff wrote:
| Derby uses class BackingStoreHashtable to implement a hash table used in
| hash joins. Despite the class's name it does not spill to disk; the hash
| table is always in memory.
|
| The optimizer tries to avoid hash joins when the input is large because
| BackingStoreHashtable would use too much memory. The optimizer just
| works with row count estimates, so there is no guarantee that
| BackingStoreHashtable will not blow up. No customer has complained about
| this yet.
|
| I would like to work on this, changing BackingStoreHashtable to spill to
| disk when the hash table gets large, and changing the optimizer to
| understand that large hash tables are allowed, but more costly.
|
| The advantages of doing this are that
| 1. hash joins will not blow up when the compiler's row count estimate
| was low,
| 2. the optmizer can choose hash joins on large inputs when that is the
| least costly join implementation, and
| 3. we can use BackingStoreHashtable to implement features such as
| INTERSECT and GROUP BY, though I am not proposing to do so now.
|
| I am not proposing to implement a hash table that can be used as an
| alternative to Btrees for a secondary index. BackingStoreHashtable is
| used for transient data structures that are only accessed by a single
| thread. A secondary index implementation must deal with locking and must
| implement hashCode methods that are JVM independent. This is much more
| work and would yield a slower implementation.
|
| I propose that BackingStoreHashtable should start off using an in-memory
| hash table even if the estimated row count is large. That way it will
| use an in-memory hash table when the actual row count is small enough
| for the table to fit in memory. When it finds that spilling to disk is
| necessary BackingStoreHashtable will use the estimated row count to
| determine the initial number of buckets and move the in-memory entries
| to disk. The disk based hash table will use a linear hashing algorithm,
| see "External Memory Algorithms and Data Structures: Dealing withMassive
| Data", Jeffrey Scott Vitter, ACM Computing Surveys, Vol. 33, No. 2, June
| 2001, pp. 209–271. It grows the hash table one bucket at a time when the
| average number of entries per bucket grows beyond a threshold. The
| disadvantage of growing by a larger number of buckets is that the last
| expansion may be unnecessarity large, wasting time and space.
|
| I would appreciate it if anyone can point me to a better external hash
| table algorithm.
|
| The disk hash table will use overflow pages because an imperfect hash
| function will cause some buckets to get more than their share of
| entries, and because we may have duplicate keys.
|
| I have not yet looked at how the optimizer handles hash joins, so I do
| not yet have a proposal for how to change that.
|
| Can anyone explain how to generate a cost for the Derby optimizer? How
| do you compute it from an estimated number of disk accesses? Does an
| estimate of 2 disk accesses mean a cost of 2.0? Should the cost estimate
| include CPU time? If so, how do you estimate it relative to disk access
| cost? Putting it another way, what does a cost estimate of 1.0 (or x)
mean?
|
| Comments, criticisms, questions?
|
| Jack Klebanoff
|
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (MingW32)
Comment: Using GnuPG with Thunderbird - http://enigmail.mozdev.org

iD8DBQFBues7EpeslyHqPs0RAiBJAJ4vxQFsUm3iUCqGcK4Lhk4LnRyTygCgmP45
HB/sPHu8wFFsW/GczwiQX3c=
=MPIS
-----END PGP SIGNATURE-----

Mime
View raw message