db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Mike Matrigali <mikem_...@sbcglobal.net>
Subject Re: [PATCH] BackingStoreHashtable
Date Mon, 28 Feb 2005 19:26:37 GMT
Any idea if many more queries will now spill to disk?  I am worried that
many more will, and now queries will run slower.  Seems like there 
should be a better connection between the optimizer estimate to use
in memory hash and what actually is done in memory.  As I understand it
now the Optimizer costs hash only as in memory, if it's estimates say
it won't fit into memory it chooses a different plan.  I assume the 
other plan is more efficient than a hash plan which may have to go to 
disk for every probe (even if pages are cached going to latched pages
on every probe is a lot more expensive than in memory hash).

My guess is no one did the work to set max_inmemory_rowcnt, because the
backing store stuff wasn't implemented yet.

I have not read the code yet.  Is the 1% of all memory, or 1% of free 
memory?


Jack Klebanoff wrote:
> I have attached a patch that causes BackingStoreHashtable to spill to 
> disk when it gets large. BackingStoreHashtable is used to implement hash 
> joins, DISTINCT, scroll insensitive cursors, and the HAVING clause. The 
> unpatched BackingStoreHashtable never spills to disk. This causes Derby 
> to sometimes run out of memory. See Jira report Derby-106.
> 
> One of the arguments of the BackingStoreHashtable constructor is the 
> maximum number of rows to store in memory. If this argument is 
> non-negative then the patched BackingStoreHashtable spills to disk when 
> more than that number of rows are added to the hash table.
> 
> If the max_inmemory_rowcnt argument is negative then 
> BackingStoreHashtable decides itself when to spill to disk. It does so 
> when its estimate of the size of the hash table (in bytes) grows larger 
> than 1% of the total memory when the BackingStoreHashtable was created. 
> Currently Derby only constructs BackingStoreHashtables with 
> max_inmemory_rowcnt = -1, so this mechanism is always used to decide 
> when to spill.
> 
> The disk portion of the hash table is handled in class DiskHashtable. 
> This does not implement a dynamic hash table data structure. Instead it 
> is implemented using an idea of Mike Matrigali. The rows are stored in a 
> standard heap conglomerate, also used by standard Derby tables. A Btree 
> is used to index the rows by hash code. We cannot index them by their 
> actual keys because our Btree implementation limits the length of a key. 
> In order to find a record by key DiskHashtable scans the Btree for all 
> rows with the same hash code and matches the retrieved rows with the 
> target key.
> 
> The advantage of this method is that it requires little new code because 
> it uses existing heap and btree conglomerate implementations. The 
> disadvantage is that it is slower than a dynamic hash table structure. 
> We expect that in most cases BackingStoreHashtable will not spill to 
> disk, so this trade off seems appropriate.
> 
> Issues and Future Work
> ------------------------
> 
> The Derby community may want to consider some follow on work.
> 
> Hash join costing should be re-evaluated now that BackingStoreHashtable 
> can spill to disk. The optimizer can consider using hash joins on larger 
> tables, but it should take the cost of disk access into account. This 
> leads into a larger issue of optimizer costing. Our cost numbers were 
> derived many years ago by timing various operations on a particular 
> machine. That machine is probably no longer available for timing 
> BackingStoreHashtable, and it is probably obsolete now. We should 
> probably revise all of our costing numbers on a more modern machine.
> 
> We may want to implement a real dynamic disk hash structure to improve 
> the speed of BackingStoreHashtable when it has spilled to disk. If it 
> were faster we could use hash joins more often, potentially improving 
> the speed of some Derby joins. Furthermore BackingStoreHashtable is used 
> elsewhere. Our assumption that BackingStoreHashtable will seldom spill 
> to disk may not be correct.
> 
> In my implementation BackingStoreHashtable keeps old rows in memory 
> after it decides to spill new rows to disk. Mike Matrigali suggested 
> that it should move the old rows to disk to reduce memory usage. This 
> comes at the price of slowing down both the time to populate a 
> BackingStoreHashtable and the time to access an element. That is why I 
> chose not to move old rows to disk.
> 
> Jack Klebanoff
> 
> 
> ------------------------------------------------------------------------
> 
> Index: java/engine/org/apache/derby/impl/sql/execute/ScrollInsensitiveResultSet.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/sql/execute/ScrollInsensitiveResultSet.java	(revision 155029)
> +++ java/engine/org/apache/derby/impl/sql/execute/ScrollInsensitiveResultSet.java	(working copy)
> @@ -66,7 +66,6 @@
>  
>  
>  	private int							sourceRowWidth;
> -	private TransactionController		tc;
>  
>  	private	  BackingStoreHashtable		ht;
>  	private	  ExecRow					resultRow;
> @@ -87,6 +86,8 @@
>  
>      private GeneratedMethod closeCleanup;
>  
> +    private boolean keepAfterCommit;
> +
>  	/**
>  	 * Constructor for a ScrollInsensitiveResultSet
>  	 *
> @@ -110,6 +111,7 @@
>  			  optimizerEstimatedRowCount, optimizerEstimatedCost);
>  		this.source = source;
>  		this.sourceRowWidth = sourceRowWidth;
> +        keepAfterCommit = activation.getResultSetHoldability();
>  		maxRows = activation.getMaxRows();
>  		if (SanityManager.DEBUG)
>  		{
> @@ -160,7 +162,7 @@
>  		 * We need BackingStoreHashtable to actually go to disk when it doesn't fit.
>  		 * This is a known limitation.
>  		 */
> -		ht = new BackingStoreHashtable(tc,
> +		ht = new BackingStoreHashtable(getTransactionController(),
>  									   null,
>  									   keyCols,
>  									   false,
> @@ -168,7 +170,8 @@
>  									   HashScanResultSet.DEFAULT_MAX_CAPACITY,
>  									   HashScanResultSet.DEFAULT_INITIAL_CAPACITY,
>  									   HashScanResultSet.DEFAULT_MAX_CAPACITY,
> -									   false);
> +									   false,
> +                                       keepAfterCommit);
>  
>  		openTime += getElapsedMillis(beginTime);
>  		setBeforeFirstRow();
> Index: java/engine/org/apache/derby/impl/sql/execute/HashTableResultSet.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/sql/execute/HashTableResultSet.java	(revision 155029)
> +++ java/engine/org/apache/derby/impl/sql/execute/HashTableResultSet.java	(working copy)
> @@ -221,7 +221,8 @@
>  										   maxInMemoryRowCount,
>  										   (int) initialCapacity,
>  										   loadFactor,
> -										   skipNullKeyColumns);
> +										   skipNullKeyColumns,
> +                                           false /* Not kept after a commit */);
>  
>  			if (runTimeStatsOn)
>  			{
> Index: java/engine/org/apache/derby/impl/store/access/BackingStoreHashTableFromScan.java
> ===================================================================
> --- java/engine/org/apache/derby/impl/store/access/BackingStoreHashTableFromScan.java	(revision 155029)
> +++ java/engine/org/apache/derby/impl/store/access/BackingStoreHashTableFromScan.java	(working copy)
> @@ -97,7 +97,8 @@
>              max_inmemory_rowcnt,
>              initialCapacity,
>              loadFactor,
> -			skipNullKeyColumns);
> +			skipNullKeyColumns,
> +            false /* Do not keep the hash table after a commit. */);
>  
>          open_scan =  (ScanManager)
>              tc.openScan(
> Index: java/engine/org/apache/derby/iapi/store/access/DiskHashtable.java
> ===================================================================
> --- java/engine/org/apache/derby/iapi/store/access/DiskHashtable.java	(revision 0)
> +++ java/engine/org/apache/derby/iapi/store/access/DiskHashtable.java	(revision 0)
> @@ -0,0 +1,377 @@
> +/*
> +
> +   Derby - Class org.apache.derby.iapi.store.access.DiskHashtable
> +
> +   Copyright 2005 The Apache Software Foundation or its licensors, as applicable.
> +
> +   Licensed under the Apache License, Version 2.0 (the "License");
> +   you may not use this file except in compliance with the License.
> +   You may obtain a copy of the License at
> +
> +      http://www.apache.org/licenses/LICENSE-2.0
> +
> +   Unless required by applicable law or agreed to in writing, software
> +   distributed under the License is distributed on an "AS IS" BASIS,
> +   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> +   See the License for the specific language governing permissions and
> +   limitations under the License.
> +
> + */
> +
> +package org.apache.derby.iapi.store.access;
> +
> +import java.util.Enumeration;
> +import java.util.NoSuchElementException;
> +import java.util.Properties;
> +import java.util.Vector;
> +import org.apache.derby.iapi.error.StandardException;
> +import org.apache.derby.iapi.services.io.FormatableBitSet;
> +import org.apache.derby.iapi.types.DataValueDescriptor;
> +import org.apache.derby.iapi.types.SQLInteger;
> +import org.apache.derby.impl.store.access.heap.HeapRowLocation;
> +import org.apache.derby.iapi.types.RowLocation;
> +import org.apache.derby.iapi.services.context.ContextService;
> +import org.apache.derby.iapi.sql.conn.LanguageConnectionContext;
> +
> +/**
> + * This class is used by BackingStoreHashtable when the BackingStoreHashtable must spill to disk.
> + * It implements the methods of a hash table: put, get, remove, elements, however it is not implemented
> + * as a hash table. In order to minimize the amount of unique code it is implemented using a Btree and a heap
> + * conglomerate. The Btree indexes the hash code of the row key. The actual key may be too long for
> + * our Btree implementation.
> + *
> + * Created: Fri Jan 28 13:58:03 2005
> + *
> + * @author <a href="mailto:klebanof@us.ibm.com">Jack Klebanoff</a>
> + * @version 1.0
> + */
> +
> +public class DiskHashtable 
> +{
> +    private final long rowConglomerateId;
> +    private ConglomerateController rowConglomerate;
> +    private final long btreeConglomerateId;
> +    private ConglomerateController btreeConglomerate;
> +    private final DataValueDescriptor[] btreeRow;
> +    private final int[] key_column_numbers;
> +    private final boolean remove_duplicates;
> +    private final TransactionController tc;
> +    private final DataValueDescriptor[] row;
> +    private final DataValueDescriptor[] scanKey = { new SQLInteger()};
> +    private int size;
> +    private boolean keepStatistics;
> +
> +    /**
> +     * Creates a new <code>DiskHashtable</code> instance.
> +     *
> +     * @param tc
> +     * @param template An array of DataValueDescriptors that serves as a template for the rows.
> +     * @param key_column_numbers The indexes of the key columns (0 based)
> +     * @param remove_duplicates If true then rows with duplicate keys are removed
> +     * @param keepAfterCommit If true then the hash table is kept after a commit
> +     */
> +    public DiskHashtable( TransactionController tc,
> +                          DataValueDescriptor[] template,
> +                          int[] key_column_numbers,
> +                          boolean remove_duplicates,
> +                          boolean keepAfterCommit)
> +        throws StandardException
> +    {
> +        this.tc = tc;
> +        this.key_column_numbers = key_column_numbers;
> +        this.remove_duplicates = remove_duplicates;
> +        LanguageConnectionContext lcc = (LanguageConnectionContext)
> +				ContextService.getContextOrNull(LanguageConnectionContext.CONTEXT_ID);
> +        keepStatistics = (lcc != null) && lcc.getRunTimeStatisticsMode();
> +        row = new DataValueDescriptor[ template.length];
> +        for( int i = 0; i < row.length; i++)
> +            row[i] = template[i].getNewNull();
> +        int tempFlags = keepAfterCommit ? (TransactionController.IS_TEMPORARY | TransactionController.IS_KEPT)
> +          : TransactionController.IS_TEMPORARY;
> +        
> +        rowConglomerateId = tc.createConglomerate( "heap",
> +                                                   template,
> +                                                   (ColumnOrdering[]) null,
> +                                                   (Properties) null,
> +                                                   tempFlags);
> +        rowConglomerate = tc.openConglomerate( rowConglomerateId,
> +                                               false, // Do not hold past the end of transaction
> +                                               TransactionController.OPENMODE_FORUPDATE,
> +                                               TransactionController.MODE_TABLE,
> +                                               TransactionController.ISOLATION_NOLOCK /* Single thread only */ );
> +
> +        btreeRow = new DataValueDescriptor[] { new SQLInteger(), rowConglomerate.newRowLocationTemplate()};
> +        Properties btreeProps = new Properties();
> +        btreeProps.put( "baseConglomerateId", String.valueOf( rowConglomerateId));
> +        btreeProps.put( "rowLocationColumn", "1");
> +        btreeProps.put( "allowDuplicates", "false"); // Because the row location is part of the key
> +        btreeProps.put( "nKeyFields", "2"); // Include the row location column
> +        btreeProps.put( "nUniqueColumns", "2"); // Include the row location column
> +        btreeProps.put( "maintainParentLinks", "false");
> +        btreeConglomerateId = tc.createConglomerate( "BTREE",
> +                                                     btreeRow,
> +                                                     (ColumnOrdering[]) null,
> +                                                     btreeProps,
> +                                                     tempFlags);
> +
> +        btreeConglomerate = tc.openConglomerate( btreeConglomerateId,
> +                                                 false, // Do not hold past the end of transaction
> +                                                 TransactionController.OPENMODE_FORUPDATE,
> +                                                 TransactionController.MODE_TABLE,
> +                                                 TransactionController.ISOLATION_NOLOCK /* Single thread only */ );
> +    } // end of constructor
> +
> +    public void close() throws StandardException
> +    {
> +        btreeConglomerate.close();
> +        rowConglomerate.close();
> +        tc.dropConglomerate( btreeConglomerateId);
> +        tc.dropConglomerate( rowConglomerateId);
> +    } // end of close
> +    
> +    /**
> +     * Put a new row in the overflow structure.
> +     *
> +     * @param row The row to be inserted.
> +     * @param hashCode The row's hash code.
> +     *
> +     * @return true if the row was added,
> +     *         false if it was not added (because it was a duplicate and we are eliminating duplicates).
> +     *
> +     * @exception StandardException standard error policy
> +     */
> +    public boolean put( Object key, Object[] row)
> +        throws StandardException
> +    {
> +        boolean isDuplicate = false;
> +        if( remove_duplicates || keepStatistics)
> +        {
> +            // Go to the work of finding out whether it is a duplicate
> +            isDuplicate = (getRemove( key, false, true) != null);
> +            if( remove_duplicates && isDuplicate)
> +                return false;
> +        }
> +        rowConglomerate.insertAndFetchLocation( (DataValueDescriptor[]) row, (RowLocation) btreeRow[1]);
> +        btreeRow[0].setValue( key.hashCode());
> +        btreeConglomerate.insert( btreeRow);
> +        if( keepStatistics && !isDuplicate)
> +            size++;
> +        return true;
> +    } // end of put
> +
> +    /**
> +     * Get a row from the overflow structure.
> +     *
> +     * @param key If the rows only have one key column then the key value. If there is more than one
> +     *            key column then a KeyHasher
> +     *
> +     * @return null if there is no corresponding row,
> +     *         the row (DataValueDescriptor[]) if there is exactly one row with the key
> +     *         a Vector of all the rows with the key if there is more than one.
> +     *
> +     * @exception StandardException
> +     */
> +    public Object get( Object key)
> +        throws StandardException
> +    {
> +        return getRemove( key, false, false);
> +    }
> +
> +    private Object getRemove( Object key, boolean remove, boolean existenceOnly)
> +        throws StandardException
> +    {
> +        int hashCode = key.hashCode();
> +        int rowCount = 0;
> +        Object retValue = null;
> +
> +        scanKey[0].setValue( hashCode);
> +        ScanController scan = tc.openScan( btreeConglomerateId,
> +                                           false, // do not hold
> +                                           remove ? TransactionController.OPENMODE_FORUPDATE : 0,
> +                                           TransactionController.MODE_TABLE,
> +                                           TransactionController.ISOLATION_READ_UNCOMMITTED,
> +                                           null, // Scan all the columns
> +                                           scanKey,
> +                                           ScanController.GE,
> +                                           (Qualifier[][]) null,
> +                                           scanKey,
> +                                           ScanController.GT);
> +        try
> +        {
> +            while( scan.fetchNext( btreeRow))
> +            {
> +                if( rowConglomerate.fetch( (RowLocation) btreeRow[1], row, (FormatableBitSet) null /* all columns */)
> +                    && rowMatches( row, key))
> +                {
> +                    if( existenceOnly)
> +                        return this;
> +
> +                    rowCount++;
> +                    if( rowCount == 1)
> +                        retValue = BackingStoreHashtable.cloneRow( row);
> +                    else 
> +                    {
> +                        Vector v;
> +                        if( rowCount == 2)
> +                        {
> +                            v = new Vector( 2);
> +                            v.add( retValue);
> +                            retValue = v;
> +                        }
> +                        else
> +                            v = (Vector) retValue;
> +                        v.add( BackingStoreHashtable.cloneRow( row));
> +                    }
> +                    if( remove)
> +                    {
> +                        rowConglomerate.delete( (RowLocation) btreeRow[1]);
> +                        scan.delete();
> +                        size--;
> +                    }
> +                    if( remove_duplicates)
> +                        // This must be the only row with the key
> +                        return retValue;
> +                }
> +            }
> +        }
> +        finally
> +        {
> +            scan.close();
> +        }
> +        return retValue;
> +    } // end of getRemove
> +
> +
> +    private boolean rowMatches( DataValueDescriptor[] row,
> +                                Object key)
> +    {
> +        if( key_column_numbers.length == 1)
> +            return row[ key_column_numbers[0]].equals( key);
> +
> +        KeyHasher kh = (KeyHasher) key;
> +        for( int i = 0; i < key_column_numbers.length; i++)
> +        {
> +            if( ! row[ key_column_numbers[i]].equals( kh.getObject(i)))
> +                return false;
> +        }
> +        return true;
> +    } // end of rowMatches
> +
> +    /**
> +     * remove all rows with a given key from the hash table.
> +     *
> +     * @param key          The key of the rows to remove.
> +     *
> +     * @return The removed row(s).
> +     *
> +	 * @exception  StandardException  Standard exception policy.
> +     **/
> +    public Object remove( Object key)
> +		throws StandardException
> +    {
> +        return getRemove( key, true, false);
> +    } // end of remove
> +
> +    /**
> +     * @return The number of rows in the hash table
> +     */
> +    public int size()
> +    {
> +        return size;
> +    }
> +    
> +    /**
> +     * Return an Enumeration that can be used to scan entire table.
> +     * <p>
> +     * RESOLVE - is it worth it to support this routine?
> +     *
> +	 * @return The Enumeration.
> +     *
> +	 * @exception  StandardException  Standard exception policy.
> +     **/
> +    public Enumeration elements()
> +        throws StandardException
> +    {
> +        return new ElementEnum();
> +    }
> +
> +    private class ElementEnum implements Enumeration
> +    {
> +        private ScanController scan;
> +        private boolean hasMore;
> +
> +        ElementEnum()
> +        {
> +            try
> +            {
> +                scan = tc.openScan( rowConglomerateId,
> +                                    false, // do not hold
> +                                    0, // read only
> +                                    TransactionController.MODE_TABLE,
> +                                    TransactionController.ISOLATION_NOLOCK,
> +                                    (FormatableBitSet) null, // all columns
> +                                    (DataValueDescriptor[]) null, // no start key
> +                                    0, // no start key operator
> +                                    (Qualifier[][]) null,
> +                                    (DataValueDescriptor[]) null, // no stop key
> +                                    0 /* no stop key operator */);
> +                hasMore = scan.next();
> +                if( ! hasMore)
> +                {
> +                    scan.close();
> +                    scan = null;
> +                }
> +            }
> +            catch( StandardException se)
> +            {
> +                hasMore = false;
> +                if( scan != null)
> +                {
> +                    try
> +                    {
> +                        scan.close();
> +                    }
> +                    catch( StandardException se1){};
> +                    scan = null;
> +                }
> +            }
> +        } // end of constructor
> +
> +        public boolean hasMoreElements()
> +        {
> +            return hasMore;
> +        }
> +
> +        public Object nextElement()
> +        {
> +            if( ! hasMore)
> +                throw new NoSuchElementException();
> +            try
> +            {
> +                scan.fetch( row);
> +                Object retValue =  BackingStoreHashtable.cloneRow( row);
> +                hasMore = scan.next();
> +                if( ! hasMore)
> +                {
> +                    scan.close();
> +                    scan = null;
> +                }
> +
> +                return retValue;
> +            }
> +            catch( StandardException se)
> +            {
> +                if( scan != null)
> +                {
> +                    try
> +                    {
> +                        scan.close();
> +                    }
> +                    catch( StandardException se1){};
> +                    scan = null;
> +                }
> +                throw new NoSuchElementException();
> +            }
> +        } // end of nextElement
> +    } // end of class ElementEnum
> +}
> Index: java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java
> ===================================================================
> --- java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java	(revision 155029)
> +++ java/engine/org/apache/derby/iapi/store/access/BackingStoreHashtable.java	(working copy)
> @@ -29,10 +29,13 @@
>  import org.apache.derby.iapi.types.CloneableObject;
>  import org.apache.derby.iapi.types.DataValueDescriptor;
>  
> +import org.apache.derby.iapi.services.cache.ClassSize;
> +
>  import java.util.Enumeration;
>  import java.util.Hashtable;
>  import java.util.Properties; 
>  import java.util.Vector;
> +import java.util.NoSuchElementException;
>  
>  /**
>  A BackingStoreHashtable is a utility class which will store a set of rows into
> @@ -102,13 +105,36 @@
>       * Fields of the class
>       **************************************************************************
>       */
> +    private TransactionController tc;
>      private Hashtable   hash_table;
>      private int[]       key_column_numbers;
>      private boolean     remove_duplicates;
>  	private boolean		skipNullKeyColumns;
>      private Properties  auxillary_runtimestats;
>  	private RowSource	row_source;
> +    /* If max_inmemory_rowcnt > 0 then use that to decide when to spill to disk.
> +     * Otherwise compute max_inmemory_size based on the JVM memory size when the BackingStoreHashtable
> +     * is constructed and use that to decide when to spill to disk.
> +     */
> +    private long max_inmemory_rowcnt;
> +    private long inmemory_rowcnt;
> +    private long max_inmemory_size;
> +    private boolean keepAfterCommit;
>  
> +    private static int vectorSize; // The estimated number of bytes used by Vector(0)
> +    static {
> +        try
> +        {
> +            vectorSize = ClassSize.estimateBase( java.util.Vector.class);
> +        }
> +        catch( SecurityException se)
> +        {
> +            vectorSize = 4*ClassSize.refSize;
> +        }
> +    };
> +    
> +    private DiskHashtable diskHashtable;
> +
>      /**************************************************************************
>       * Constructors for This class:
>       **************************************************************************
> @@ -163,7 +189,10 @@
>  	 *
>  	 * @param skipNullKeyColumns	Skip rows with a null key column, if true.
>       *
> +     * @param keepAfterCommit If true the hash table is kept after a commit,
> +     *                        if false the hash table is dropped on the next commit.
>       *
> +     *
>  	 * @exception  StandardException  Standard exception policy.
>       **/
>      public BackingStoreHashtable(
> @@ -175,13 +204,21 @@
>      long                    max_inmemory_rowcnt,
>      int                     initialCapacity,
>      float                   loadFactor,
> -	boolean					skipNullKeyColumns)
> +	boolean					skipNullKeyColumns,
> +    boolean                 keepAfterCommit)
>          throws StandardException
>      {
>          this.key_column_numbers    = key_column_numbers;
>          this.remove_duplicates    = remove_duplicates;
>  		this.row_source			   = row_source;
>  		this.skipNullKeyColumns	   = skipNullKeyColumns;
> +        this.max_inmemory_rowcnt = max_inmemory_rowcnt;
> +        if( max_inmemory_rowcnt > 0)
> +            max_inmemory_size = Long.MAX_VALUE;
> +        else
> +            max_inmemory_size = Runtime.getRuntime().totalMemory()/100;
> +        this.tc = tc;
> +        this.keepAfterCommit = keepAfterCommit;
>  
>          Object[] row;
>  
> @@ -280,7 +317,7 @@
>       *
>  	 * @exception  StandardException  Standard exception policy.
>       **/
> -    private Object[] cloneRow(Object[] old_row)
> +    static Object[] cloneRow(Object[] old_row)
>          throws StandardException
>      {
>          Object[] new_row = new DataValueDescriptor[old_row.length];
> @@ -300,8 +337,6 @@
>       * @param row               Row to add to the hash table.
>       * @param hash_table        The java HashTable to load into.
>       *
> -	 * @return true if successful, false if heap add fails.
> -     *
>  	 * @exception  StandardException  Standard exception policy.
>       **/
>      private void add_row_to_hash_table(
> @@ -310,9 +345,14 @@
>      Object[]    row)
>  		throws StandardException
>      {
> +        if( spillToDisk( hash_table, key, row))
> +            return;
> +        
>          Object  duplicate_value = null;
>  
> -        if ((duplicate_value = hash_table.put(key, row)) != null)
> +        if ((duplicate_value = hash_table.put(key, row)) == null)
> +            doSpaceAccounting( row, false);
> +        else
>          {
>              if (!remove_duplicates)
>              {
> @@ -321,6 +361,7 @@
>                  // inserted a duplicate
>                  if ((duplicate_value instanceof Vector))
>                  {
> +                    doSpaceAccounting( row, false);
>                      row_vec = (Vector) duplicate_value;
>                  }
>                  else
> @@ -330,6 +371,7 @@
>  
>                      // insert original row into vector
>                      row_vec.addElement(duplicate_value);
> +                    doSpaceAccounting( row, true);
>                  }
>  
>                  // insert new row into vector
> @@ -345,6 +387,89 @@
>          row = null;
>      }
>  
> +    private void doSpaceAccounting( Object[] row,
> +                                    boolean firstDuplicate)
> +    {
> +        inmemory_rowcnt++;
> +        if( max_inmemory_rowcnt <= 0)
> +        {
> +            for( int i = 0; i < row.length; i++)
> +            {
> +                if( row[i] instanceof DataValueDescriptor)
> +                    max_inmemory_size -= ((DataValueDescriptor) row[i]).estimateMemoryUsage();
> +                max_inmemory_size -= ClassSize.refSize;
> +            }
> +            max_inmemory_size -= ClassSize.refSize;
> +            if( firstDuplicate)
> +                max_inmemory_size -= vectorSize;
> +        }
> +    } // end of doSpaceAccounting
> +
> +    /**
> +     * Determine whether a new row should be spilled to disk and, if so, do it.
> +     *
> +     * @param hash_table The in-memory hash table
> +     * @param key The row's key
> +     * @param row
> +     *
> +     * @return true if the row was spilled to disk, false if not
> +     *
> +     * @exception  StandardException  Standard exception policy.
> +     */
> +    private boolean spillToDisk( Hashtable   hash_table,
> +                                 Object      key,
> +                                 Object[]    row)
> +		throws StandardException
> +    {
> +        // Once we have started spilling all new rows will go to disk, even if we have freed up some
> +        // memory by moving duplicates to disk. This simplifies handling of duplicates and accounting.
> +        if( diskHashtable == null)
> +        {
> +            if( max_inmemory_rowcnt > 0)
> +            {
> +                if( inmemory_rowcnt < max_inmemory_rowcnt)
> +                    return false; // Do not spill
> +            }
> +            else if( max_inmemory_size > 0)
> +                return false;
> +            // Want to start spilling
> +            if( ! (row instanceof DataValueDescriptor[]))
> +            {
> +                if( SanityManager.DEBUG)
> +                    SanityManager.THROWASSERT( "BackingStoreHashtable row is not DataValueDescriptor[]");
> +                // Do not know how to put it on disk
> +                return false;
> +            }
> +            diskHashtable = new DiskHashtable( tc,
> +                                               (DataValueDescriptor[]) row,
> +                                               key_column_numbers,
> +                                               remove_duplicates,
> +                                               keepAfterCommit);
> +        }
> +        
> +        Object duplicateValue = hash_table.get( key);
> +        if( duplicateValue != null)
> +        {
> +            if( remove_duplicates)
> +                return true; // a degenerate case of spilling
> +            // If we are keeping duplicates then move all the duplicates from memory to disk
> +            // This simplifies finding duplicates: they are either all in memory or all on disk.
> +            if( duplicateValue instanceof Vector)
> +            {
> +                Vector duplicateVec = (Vector) duplicateValue;
> +                for( int i = duplicateVec.size() - 1; i >= 0; i--)
> +                {
> +                    Object[] dupRow = (Object[]) duplicateVec.elementAt(i);
> +                    diskHashtable.put( key, dupRow);
> +                }
> +            }
> +            else
> +                diskHashtable.put( key, (Object []) duplicateValue);
> +            hash_table.remove( key);
> +        }
> +        diskHashtable.put( key, row);
> +        return true;
> +    } // end of spillToDisk
>      /**************************************************************************
>       * Public Methods of This class:
>       **************************************************************************
> @@ -364,6 +489,11 @@
>  		throws StandardException
>      {
>          hash_table = null;
> +        if( diskHashtable != null)
> +        {
> +            diskHashtable.close();
> +            diskHashtable = null;
> +        }
>          return;
>      }
>  
> @@ -380,7 +510,9 @@
>      public Enumeration elements()
>          throws StandardException
>      {
> -        return(hash_table.elements());
> +        if( diskHashtable == null)
> +            return(hash_table.elements());
> +        return new BackingStoreHashtableEnumeration();
>      }
>  
>      /**
> @@ -420,7 +552,10 @@
>      public Object get(Object key)
>  		throws StandardException
>      {
> -        return(hash_table.get(key));
> +        Object obj = hash_table.get(key);
> +        if( diskHashtable == null || obj != null)
> +            return obj;
> +        return diskHashtable.get( key);
>      }
>  
>      /**
> @@ -451,7 +586,10 @@
>      Object      key)
>  		throws StandardException
>      {
> -        return(hash_table.remove(key));
> +        Object obj = hash_table.remove(key);
> +        if( obj != null || diskHashtable == null)
> +            return obj;
> +        return diskHashtable.remove(key);
>      }
>  
>      /**
> @@ -553,7 +691,54 @@
>      public int size()
>  		throws StandardException
>      {
> -        return(hash_table.size());
> +        if( diskHashtable == null)
> +            return(hash_table.size());
> +        return hash_table.size() + diskHashtable.size();
>      }
>  
> +    private class BackingStoreHashtableEnumeration implements Enumeration
> +    {
> +        private Enumeration memoryEnumeration;
> +        private Enumeration diskEnumeration;
> +
> +        BackingStoreHashtableEnumeration()
> +        {
> +            memoryEnumeration = hash_table.elements();
> +            if( diskHashtable != null)
> +            {
> +                try
> +                {
> +                    diskEnumeration = diskHashtable.elements();
> +                }
> +                catch( StandardException se)
> +                {
> +                    diskEnumeration = null;
> +                }
> +            }
> +        }
> +        
> +        public boolean hasMoreElements()
> +        {
> +            if( memoryEnumeration != null)
> +            {
> +                if( memoryEnumeration.hasMoreElements())
> +                    return true;
> +                memoryEnumeration = null;
> +            }
> +            if( diskEnumeration == null)
> +                return false;
> +            return diskEnumeration.hasMoreElements();
> +        }
> +
> +        public Object nextElement() throws NoSuchElementException
> +        {
> +            if( memoryEnumeration != null)
> +            {
> +                if( memoryEnumeration.hasMoreElements())
> +                    return memoryEnumeration.nextElement();
> +                memoryEnumeration = null;
> +            }
> +            return diskEnumeration.nextElement();
> +        }
> +    } // end of class BackingStoreHashtableEnumeration
>  }
> Index: java/testing/org/apache/derbyTesting/functionTests/tests/lang/SpillHash.java
> ===================================================================
> --- java/testing/org/apache/derbyTesting/functionTests/tests/lang/SpillHash.java	(revision 0)
> +++ java/testing/org/apache/derbyTesting/functionTests/tests/lang/SpillHash.java	(revision 0)
> @@ -0,0 +1,459 @@
> +/*
> +
> +   Derby - Class org.apache.derbyTesting.functionTests.tests.lang.bug4356
> +
> +   Copyright 2001, 2004 The Apache Software Foundation or its licensors, as applicable.
> +
> +   Licensed under the Apache License, Version 2.0 (the "License");
> +   you may not use this file except in compliance with the License.
> +   You may obtain a copy of the License at
> +
> +      http://www.apache.org/licenses/LICENSE-2.0
> +
> +   Unless required by applicable law or agreed to in writing, software
> +   distributed under the License is distributed on an "AS IS" BASIS,
> +   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> +   See the License for the specific language governing permissions and
> +   limitations under the License.
> +
> + */
> +
> +package org.apache.derbyTesting.functionTests.tests.lang;
> +
> +import java.sql.Connection;
> +import java.sql.DriverManager;
> +import java.sql.DatabaseMetaData;
> +import java.sql.ResultSet;
> +import java.sql.PreparedStatement;
> +import java.sql.Statement;
> +import java.sql.SQLException;
> +import java.util.BitSet;
> +
> +import org.apache.derby.tools.ij;
> +import org.apache.derby.tools.JDBCDisplayUtil;
> +
> +/**
> + * Test BackingStoreHashtable spilling to disk.
> + * BackingStoreHashtable is used to implement hash joins, distinct, scroll insensitive cursors,
> + * outer joins, and the HAVING clause.
> + */
> +public class SpillHash
> +{
> +    private static PreparedStatement joinStmt;
> +    private static PreparedStatement distinctStmt;
> +    private static PreparedStatement ojStmt1;
> +    private static final int LOTS_OF_ROWS = 10000;
> +    private static int errorCount = 0;
> +    
> +    public static void main (String args[]) 
> +    {
> +        try {
> +            /* Load the JDBC Driver class */
> +            // use the ij utility to read the property file and
> +            // make the initial connection.
> +            ij.getPropertyArg(args);
> +            Connection conn = ij.startJBMS();
> +            Statement stmt = conn.createStatement();
> +
> +            for( int i = 0; i < prep.length; i++)
> +                stmt.executeUpdate( prep[i]);
> +            PreparedStatement insA = conn.prepareStatement( "insert into ta(ca1,ca2) values(?,?)");
> +            PreparedStatement insB = conn.prepareStatement( "insert into tb(cb1,cb2) values(?,?)");
> +            insertDups( insA, insB, initDupVals);
> +
> +            joinStmt =
> +              conn.prepareStatement( "select ta.ca1, ta.ca2, tb.cb2 from ta, tb where ca1 = cb1");
> +            distinctStmt =
> +              conn.prepareStatement( "select distinct ca1 from ta");
> +            ojStmt1 =
> +              conn.prepareStatement( "select cc1, cc2 from tc group by cc1,cc2 having cc1 in (select ta.ca1 from ta)");
> +
> +            runStatements( conn, 0, new String[][][] {initDupVals});
> +
> +            System.out.println( "Growing database.");
> +            
> +            // Add a lot of rows so that the hash tables have to spill to disk
> +            conn.setAutoCommit(false);
> +            for( int i = 1; i <= LOTS_OF_ROWS; i++)
> +            {
> +                insA.setInt(1, i);
> +                insA.setString(2, ca2Val(i));
> +                insA.executeUpdate();
> +                insB.setInt(1, i);
> +                insB.setString(2, cb2Val(i));
> +                insB.executeUpdate();
> +            }
> +            conn.commit();
> +            insertDups( insA, insB, spillDupVals);
> +            conn.commit();
> +
> +            conn.setAutoCommit(true);
> +            runStatements( conn, LOTS_OF_ROWS, new String[][][] {initDupVals, spillDupVals});
> +            
> +            conn.close();
> +        } catch (Exception e) {
> +            System.out.println("FAIL -- unexpected exception "+e);
> +            JDBCDisplayUtil.ShowException(System.out, e);
> +            e.printStackTrace();
> +            errorCount++;
> +        }
> +        if( errorCount == 0)
> +        {
> +            System.out.println( "PASSED.");
> +            System.exit(0);
> +        }
> +        else
> +        {
> +            System.out.println( "FAILED: " + errorCount + ((errorCount == 1) ? " error" : " errors"));
> +            System.exit(1);
> +        }
> +    } // end of main
> +    
> +    private static final String[] prep =
> +    {
> +        "create table ta (ca1 integer, ca2 char(200))",
> +        "create table tb (cb1 integer, cb2 char(200))",
> +        "create table tc (cc1 integer, cc2 varchar(16))",
> +        "insert into ta(ca1,ca2) values(null, 'Anull')",
> +        "insert into tb(cb1,cb2) values(null, 'Bnull')",
> +        "insert into tc(cc1,cc2) values(0, 'C0')"
> +    };
> +
> +    private static final String[][] initDupVals =
> +    {
> +        { "0a", "0b"},
> +        { "1a", "1b"},
> +        { "2a"}
> +    };
> +    private static final String[][] spillDupVals =
> +    {
> +        {},
> +        { "1c"},
> +        { "2b"},
> +        { "3a", "3b", "3c"}
> +    };
> +
> +    private static void insertDups( PreparedStatement insA, PreparedStatement insB, String[][] dupVals)
> +        throws SQLException
> +    {
> +        for( int i = 0; i < dupVals.length; i++)
> +        {
> +            insA.setInt(1, -i);
> +            insB.setInt(1, -i);
> +            String[] vals = dupVals[i];
> +            for( int j = 0; j < vals.length; j++)
> +            {
> +                insA.setString( 2, "A" + vals[j]);
> +                insA.executeUpdate();
> +                insB.setString( 2, "B" + vals[j]);
> +                insB.executeUpdate();
> +            }
> +        }
> +    } // end of insertDups
> +    
> +    private static String ca2Val( int col1Val)
> +    {
> +        return "A" + col1Val;
> +    }
> +    
> +    private static String cb2Val( int col1Val)
> +    {
> +        return "B" + col1Val;
> +    }
> +    
> +    private static void runStatements( Connection conn, int maxColValue, String[][][] dupVals)
> +        throws SQLException
> +    {
> +        runJoin( conn, maxColValue, dupVals);
> +        runDistinct( conn, maxColValue, dupVals);
> +        runOj( conn, maxColValue, dupVals, "1", ojStmt1);
> +        runCursor( conn, maxColValue, dupVals);
> +    }
> +
> +    private static void runJoin( Connection conn, int maxColValue, String[][][] dupVals)
> +        throws SQLException
> +    {
> +        System.out.println( "Running join");
> +        int expectedRowCount = maxColValue; // plus expected duplicates, to be counted below
> +        ResultSet rs = joinStmt.executeQuery();
> +        BitSet joinRowFound = new BitSet( maxColValue);
> +        int dupKeyCount = 0;
> +        for( int i = 0; i < dupVals.length; i++)
> +        {
> +            if( dupVals[i].length > dupKeyCount)
> +                dupKeyCount = dupVals[i].length;
> +        }
> +        BitSet[] dupsFound = new BitSet[dupKeyCount];
> +        int[] dupCount = new int[ dupKeyCount];
> +        for( int i = 0; i < dupKeyCount; i++)
> +        {
> +            // count the number of rows with column(1) == -i
> +            dupCount[i] = 0;
> +            for( int j = 0; j < dupVals.length; j++)
> +            {
> +                if( i < dupVals[j].length)
> +                    dupCount[i] += dupVals[j][i].length;
> +            }
> +            dupsFound[i] = new BitSet(dupCount[i]*dupCount[i]);
> +            expectedRowCount += dupCount[i]*dupCount[i];
> +        }
> +        
> +        int count;
> +        for( count = 0; rs.next(); count++)
> +        {
> +            int col1Val = rs.getInt(1);
> +            if( rs.wasNull())
> +            {
> +                System.out.println( "Null in join column.");
> +                errorCount++;
> +                continue;
> +            }
> +            if( col1Val > maxColValue)
> +            {
> +                System.out.println( "Invalid value in first join column.");
> +                errorCount++;
> +                continue;
> +            }
> +            if( col1Val > 0)
> +            {
> +                if( joinRowFound.get( col1Val - 1))
> +                {
> +                    System.out.println( "Multiple rows for value " + col1Val);
> +                    errorCount++;
> +                }
> +                joinRowFound.set( col1Val - 1);
> +                String col2Val = trim( rs.getString(2));
> +                String col3Val = trim( rs.getString(3));
> +                if( !( ca2Val( col1Val).equals( col2Val) && cb2Val( col1Val).equals( col3Val)))
> +                {
> +                    System.out.println( "Incorrect value in column 2 or 3 of join.");
> +                    errorCount++;
> +                }
> +            }
> +            else // col1Val <= 0, there are duplicates in the source tables
> +            {
> +                int dupKeyIdx = -col1Val;
> +                int col2Idx = findDupVal( rs, 2, 'A', dupKeyIdx, dupVals);
> +                int col3Idx = findDupVal( rs, 3, 'B', dupKeyIdx, dupVals);
> +                if( col2Idx < 0 || col3Idx < 0)
> +                    continue;
> +
> +                int idx = col2Idx + dupCount[dupKeyIdx]*col3Idx;
> +                if( dupsFound[dupKeyIdx].get( idx))
> +                {
> +                    System.out.println( "Repeat of row with key value 0");
> +                    errorCount++;
> +                }
> +                dupsFound[dupKeyIdx].set( idx);
> +            }
> +        };
> +        if( count != expectedRowCount)
> +        {
> +            System.out.println( "Incorrect number of rows in join.");
> +            errorCount++;
> +        }
> +        rs.close();
> +    } // end of runJoin
> +
> +    private static int findDupVal( ResultSet rs, int col, char prefix, int keyIdx, String[][][] dupVals)
> +        throws SQLException
> +    {
> +        String colVal = rs.getString(col);
> +        if( colVal != null && colVal.length() > 1 || colVal.charAt(0) == prefix)
> +        {
> +            colVal = trim( colVal.substring( 1));
> +            int dupIdx = 0;
> +            for( int i = 0; i < dupVals.length; i++)
> +            {
> +                if( keyIdx < dupVals[i].length)
> +                {
> +                    for( int j = 0; j < dupVals[i][keyIdx].length; j++, dupIdx++)
> +                    {
> +                        if( colVal.equals( dupVals[i][keyIdx][j]))
> +                            return dupIdx;
> +                    }
> +                }
> +            }
> +        }
> +        System.out.println( "Incorrect value in column " + col + " of join with duplicate keys.");
> +        errorCount++;
> +        return -1;
> +    } // end of findDupVal
> +        
> +    private static String trim( String str)
> +    {
> +        if( str == null)
> +            return str;
> +        return str.trim();
> +    }
> +    
> +    private static void runDistinct( Connection conn, int maxColValue, String[][][] dupVals)
> +        throws SQLException
> +    {
> +        System.out.println( "Running distinct");
> +        ResultSet rs = distinctStmt.executeQuery();
> +        checkAllCa1( rs, false, maxColValue, dupVals, "DISTINCT");
> +    }
> +
> +    private static void checkAllCa1( ResultSet rs, boolean expectDups, int maxColValue, String[][][] dupVals, String label)
> +        throws SQLException
> +    {
> +        int dupKeyCount = 0;
> +        for( int i = 0; i < dupVals.length; i++)
> +        {
> +            if( dupVals[i].length > dupKeyCount)
> +                dupKeyCount = dupVals[i].length;
> +        }
> +        int[] expectedDupCount = new int[dupKeyCount];
> +        int[] dupFoundCount = new int[dupKeyCount];
> +        for( int i = 0; i < dupKeyCount; i++)
> +        {
> +            
> +            dupFoundCount[i] = 0;
> +            if( !expectDups)
> +                expectedDupCount[i] = 1;
> +            else
> +            {
> +                expectedDupCount[i] = 0;
> +                for( int j = 0; j < dupVals.length; j++)
> +                {
> +                    if( i < dupVals[j].length)
> +                        expectedDupCount[i] += dupVals[j][i].length;
> +                }
> +            }
> +        }
> +        BitSet found = new BitSet( maxColValue);
> +        int count = 0;
> +        boolean nullFound = false;
> +        try
> +        {
> +            for( count = 0; rs.next();)
> +            {
> +                int col1Val = rs.getInt(1);
> +                if( rs.wasNull())
> +                {
> +                    if( nullFound)
> +                    {
> +                        System.out.println( "Too many nulls returned by " + label);
> +                        errorCount++;
> +                        continue;
> +                    }
> +                    nullFound = true;
> +                    continue;
> +                }
> +                if( col1Val <= -dupKeyCount || col1Val > maxColValue)
> +                {
> +                    System.out.println( "Invalid value returned by " + label);
> +                    errorCount++;
> +                    continue;
> +                }
> +                if( col1Val <= 0)
> +                {
> +                    dupFoundCount[ -col1Val]++;
> +                    if( !expectDups)
> +                    {
> +                        if( dupFoundCount[ -col1Val] > 1)
> +                        {
> +                            System.out.println( label + " returned a duplicate.");
> +                            errorCount++;
> +                            continue;
> +                        }
> +                    }
> +                    else if( dupFoundCount[ -col1Val] > expectedDupCount[ -col1Val])
> +                    {
> +                        System.out.println( label + " returned too many duplicates.");
> +                        errorCount++;
> +                        continue;
> +                    }
> +                }
> +                else
> +                {
> +                    if( found.get( col1Val))
> +                    {
> +                        System.out.println( label + " returned a duplicate.");
> +                        errorCount++;
> +                        continue;
> +                    }
> +                    found.set( col1Val);
> +                    count++;
> +                }
> +            }
> +            if( count != maxColValue)
> +            {
> +                System.out.println( "Incorrect number of rows in " + label);
> +                errorCount++;
> +            }
> +            for( int i = 0; i < dupFoundCount.length; i++)
> +            {
> +                if( dupFoundCount[i] != expectedDupCount[i])
> +                {
> +                    System.out.println( "A duplicate key row is missing in " + label);
> +                    errorCount++;
> +                    break;
> +                }
> +            }
> +        }
> +        finally
> +        {
> +            rs.close();
> +        }
> +    } // End of checkAllCa1
> +
> +    private static void runOj( Connection conn,
> +                               int maxColValue,
> +                               String[][][] dupVals,
> +                               String ojId,
> +                               PreparedStatement ojStmt)
> +        throws SQLException
> +    {
> +        System.out.println( "Running outer join " + ojId);
> +        ResultSet rs = ojStmt.executeQuery();
> +        try
> +        {
> +            if( ! rs.next())
> +            {
> +                System.out.println( "Outer join " + ojId + " returned no rows.");
> +                errorCount++;
> +                return;
> +            }
> +            int cc1 = rs.getInt(1);
> +            if( rs.wasNull())
> +            {
> +                System.out.println( "Outer join " + ojId + " returned null");
> +                errorCount++;
> +                return;
> +            }
> +            String cc2 = trim(rs.getString(2));
> +            if( rs.wasNull())
> +            {
> +                System.out.println( "Outer join " + ojId + " returned null");
> +                errorCount++;
> +                return;
> +            }
> +            if( cc1 != 0 || ! "C0".equals( cc2))
> +            {
> +                System.out.println( "Outer join " + ojId + " returned wrong values");
> +                errorCount++;
> +                return;
> +            }
> +            if( rs.next())
> +            {
> +                System.out.println( "Outer join " + ojId + " returned too many rows.");
> +                errorCount++;
> +            }
> +        }
> +        finally
> +        {
> +            rs.close();
> +        }
> +    } // End of runOj
> +
> +    private static void runCursor( Connection conn, int maxColValue, String[][][] dupVals)
> +        throws SQLException
> +    {
> +        System.out.println( "Running scroll insensitive cursor");
> +        Statement stmt = conn.createStatement(ResultSet.TYPE_SCROLL_INSENSITIVE, ResultSet.CONCUR_READ_ONLY);
> +        ResultSet rs = stmt.executeQuery( "SELECT ca1 FROM ta");
> +        checkAllCa1( rs, true, maxColValue, dupVals, "scroll insensitive cursor");
> +    }
> +}
> Index: java/testing/org/apache/derbyTesting/functionTests/tests/store/TestDiskHashtable.java
> ===================================================================
> --- java/testing/org/apache/derbyTesting/functionTests/tests/store/TestDiskHashtable.java	(revision 0)
> +++ java/testing/org/apache/derbyTesting/functionTests/tests/store/TestDiskHashtable.java	(revision 0)
> @@ -0,0 +1,432 @@
> +/*
> +
> +   Derby - Class org.apache.derbyTesting.functionTests.tests.store.TestDiskHashtable
> +
> +   Copyright 2005 The Apache Software Foundation or its licensors, as applicable.
> +
> +   Licensed under the Apache License, Version 2.0 (the "License");
> +   you may not use this file except in compliance with the License.
> +   You may obtain a copy of the License at
> +
> +      http://www.apache.org/licenses/LICENSE-2.0
> +
> +   Unless required by applicable law or agreed to in writing, software
> +   distributed under the License is distributed on an "AS IS" BASIS,
> +   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
> +   See the License for the specific language governing permissions and
> +   limitations under the License.
> +
> + */
> +
> +package org.apache.derbyTesting.functionTests.tests.store;
> +
> +import java.sql.Connection;
> +import java.sql.ResultSet;
> +import java.sql.SQLException;
> +import java.sql.Statement;
> +
> +import java.util.BitSet;
> +import java.util.Enumeration;
> +import java.util.HashMap;
> +import java.util.Vector;
> +
> +import org.apache.derby.iapi.error.PublicAPI;
> +import org.apache.derby.iapi.error.StandardException;
> +import org.apache.derby.iapi.sql.conn.ConnectionUtil;
> +import org.apache.derby.iapi.sql.conn.LanguageConnectionContext;
> +import org.apache.derby.iapi.store.access.DiskHashtable;
> +import org.apache.derby.iapi.store.access.KeyHasher;
> +import org.apache.derby.iapi.store.access.TransactionController;
> +import org.apache.derby.iapi.types.DataValueDescriptor;
> +import org.apache.derby.iapi.types.Orderable;
> +import org.apache.derby.iapi.types.SQLInteger;
> +import org.apache.derby.iapi.types.SQLLongint;
> +import org.apache.derby.iapi.types.SQLVarchar;
> +import org.apache.derby.tools.ij;
> +import org.apache.derbyTesting.functionTests.util.TestUtil;
> +
> +/**
> + * This program tests the org.apache.derby.iapi.store.access.DiskHashtable class.
> + * The unit test interface is not used because that is undocumented and very difficult to decipher.
> + * Furthermore it is difficult to diagnose problems when using the unit test interface.
> + *
> + * Created: Wed Feb 09 15:44:12 2005
> + *
> + * @author <a href="mailto:klebanof@us.ibm.com">Jack Klebanoff</a>
> + * @version 1.0
> + */
> +public class TestDiskHashtable 
> +{
> +    private TransactionController tc;
> +    private int failed = 0;
> +    
> +    public static void main( String args[])
> +    {
> +        int failed = 1;
> +
> +		REPORT("Test DiskHashtable starting");
> +        try
> +        {
> +			// use the ij utility to read the property file and
> +			// make the initial connection.
> +			ij.getPropertyArg(args);
> +			Connection conn = ij.startJBMS();
> +            Statement stmt = conn.createStatement();
> +            stmt.execute("CREATE FUNCTION testDiskHashtable() returns INTEGER EXTERNAL NAME 'org.apache.derbyTesting.functionTests.tests.store.TestDiskHashtable.runTests' LANGUAGE JAVA PARAMETER STYLE JAVA");
> +            ResultSet rs = stmt.executeQuery( "values( testDiskHashtable())");
> +            if( rs.next())
> +                failed = rs.getInt(1);
> +            stmt.close();
> +            conn.close();
> +        }
> +        catch( SQLException e)
> +        {
> +			TestUtil.dumpSQLExceptions( e);
> +            failed = 1;
> +        }
> +        catch( Throwable t)
> +        {
> +			REPORT("FAIL -- unexpected exception:" + t.toString());
> +            failed = 1;
> +		}
> +        REPORT( (failed == 0) ? "OK" : "FAILED");
> +        System.exit( (failed == 0) ? 0 : 1);
> +    }
> +
> +    private void REPORT_FAILURE(String msg)
> +    {
> +        failed = 1;
> +        REPORT( msg);
> +    }
> +    
> +    private static void REPORT(String msg)
> +    {
> +        System.out.println( msg);
> +    }
> +    
> +    public static int runTests() throws SQLException
> +    {
> +        TestDiskHashtable tester = new TestDiskHashtable();
> +        return tester.doIt();
> +    }
> +
> +    private TestDiskHashtable() throws SQLException
> +    {
> +        LanguageConnectionContext lcc = ConnectionUtil.getCurrentLCC();
> +        if( lcc == null)
> +            throw new SQLException( "Cannot get the LCC");
> +        tc = lcc.getTransactionExecute();
> +    }
> +
> +    private int doIt() throws SQLException
> +    {
> +		try {
> +
> +
> +            REPORT( "Starting single key, keep duplicates test");
> +            testOneVariant( tc, false, singleKeyTemplate, singleKeyCols, singleKeyRows);
> +            REPORT( "Starting single key, remove duplicates test");
> +            testOneVariant( tc, true, singleKeyTemplate, singleKeyCols, singleKeyRows);
> +            REPORT( "Starting multiple key, keep duplicates test");
> +            testOneVariant( tc, false, multiKeyTemplate, multiKeyCols, multiKeyRows);
> +            REPORT( "Starting multiple key, remove duplicates test");
> +            testOneVariant( tc, true, multiKeyTemplate, multiKeyCols, multiKeyRows);
> +
> +			tc.commit();
> +		}
> +		catch (StandardException se)
> +		{
> +            throw PublicAPI.wrapStandardException( se);
> +        }
> +        return failed;
> +    } // end of doIt
> +
> +    private static final DataValueDescriptor[] singleKeyTemplate = { new SQLInteger(), new SQLVarchar()};
> +    private static final int[] singleKeyCols = {0};
> +    private static final DataValueDescriptor[][] singleKeyRows =
> +    {
> +        {new SQLInteger(1), new SQLVarchar("abcd")},
> +        {new SQLInteger(2), new SQLVarchar("abcd")},
> +        {new SQLInteger(3), new SQLVarchar("e")},
> +        {new SQLInteger(1), new SQLVarchar("zz")}
> +    };
> +
> +    private static final DataValueDescriptor[] multiKeyTemplate = { new SQLLongint(), new SQLVarchar(), new SQLInteger()};
> +    private static final int[] multiKeyCols = {1, 0};
> +    private static final DataValueDescriptor[][] multiKeyRows =
> +    {
> +        {new SQLLongint(1), new SQLVarchar( "aa"), multiKeyTemplate[2].getNewNull()},
> +        {new SQLLongint(2), new SQLVarchar( "aa"), new SQLInteger(1)},
> +        {new SQLLongint(2), new SQLVarchar( "aa"), new SQLInteger(2)},
> +        {new SQLLongint(2), new SQLVarchar( "b"), new SQLInteger(1)}
> +    };
> +
> +    private static final int LOTS_OF_ROWS_COUNT = 50000;
> +    
> +    private void testOneVariant( TransactionController tc,
> +                                 boolean removeDups,
> +                                 DataValueDescriptor[] template,
> +                                 int[] keyCols,
> +                                 DataValueDescriptor[][] rows)
> +        throws StandardException
> +    {
> +        DiskHashtable dht = new DiskHashtable(tc, template, keyCols, removeDups, false);
> +        boolean[] isDuplicate = new boolean[ rows.length];
> +        boolean[] found = new boolean[ rows.length];
> +        HashMap simpleHash = new HashMap( rows.length);
> +
> +        testElements( removeDups, dht, keyCols, 0, rows, simpleHash, isDuplicate, found);
> +
> +        for( int i = 0; i < rows.length; i++)
> +        {
> +            Object key = KeyHasher.buildHashKey( rows[i], keyCols);
> +            Vector al = (Vector) simpleHash.get( key);
> +            isDuplicate[i] = (al != null);
> +            if( al == null)
> +            {
> +                al = new Vector(4);
> +                simpleHash.put( key, al);
> +            }
> +            if( (!removeDups) || !isDuplicate[i])
> +                al.add( rows[i]);
> +            
> +            if( dht.put( key, rows[i]) != (removeDups ? (!isDuplicate[i]) : true))
> +                REPORT_FAILURE( "  put returned wrong value on row " + i);
> +
> +            for( int j = 0; j <= i; j++)
> +            {
> +                key = KeyHasher.buildHashKey( rows[j], keyCols);
> +                if( ! rowsEqual( dht.get( key), simpleHash.get( key)))
> +                    REPORT_FAILURE( "  get returned wrong value on key " + j);
> +            }
> +
> +            testElements( removeDups, dht, keyCols, i+1, rows, simpleHash, isDuplicate, found);
> +        }
> +        // Remove them
> +        for( int i = 0; i < rows.length; i++)
> +        {
> +            Object key = KeyHasher.buildHashKey( rows[i], keyCols);
> +            if( ! rowsEqual( dht.remove( key), simpleHash.get( key)))
> +                REPORT_FAILURE( "  remove returned wrong value on key " + i);
> +            simpleHash.remove( key);
> +            if( dht.get( key) != null)
> +                REPORT_FAILURE( "  remove did not delete key " + i);
> +        }
> +        testElements( removeDups, dht, keyCols, 0, rows, simpleHash, isDuplicate, found);
> +
> +        testLargeTable( dht, keyCols, rows[0]);
> +        dht.close();
> +    } // end of testOneVariant
> +
> +    private void testLargeTable( DiskHashtable dht,
> +                                 int[] keyCols,
> +                                 DataValueDescriptor[] aRow)
> +        throws StandardException
> +    {
> +        // Add a lot of elements
> +        // If there are two or more key columns then we will vary the first two key columns, using an approximately
> +        // square matrix of integer key values. Because the hash generator is commutative key (i,j) hashes into the
> +        // same bucket as key (j,i), testing the case where different keys hash into the same bucket.
> +        int key1Count = (keyCols.length > 1) ? ((int) Math.round( Math.sqrt( (double) LOTS_OF_ROWS_COUNT))) : 1;
> +        int key0Count = (LOTS_OF_ROWS_COUNT + key1Count - 1)/key1Count;
> +
> +        DataValueDescriptor[] row = new DataValueDescriptor[ aRow.length];
> +        for( int i = 0; i < row.length; i++)
> +            row[i] = aRow[i].getClone();
> +        
> +        for( int key0Idx = 0; key0Idx < key0Count; key0Idx++)
> +        {
> +            row[ keyCols[0]].setValue( key0Idx);
> +            for( int key1Idx = 0; key1Idx < key1Count; key1Idx++)
> +            {
> +                if( keyCols.length > 1)
> +                    row[ keyCols[1]].setValue( key1Idx);
> +                Object key = KeyHasher.buildHashKey( row, keyCols);
> +                if( ! dht.put( key, row))
> +                {
> +                    REPORT_FAILURE( "  put returned wrong value for key(" + key0Idx + "," + key1Idx + ")");
> +                    key0Idx = key0Count;
> +                    break;
> +                }
> +            }
> +        }
> +        for( int key0Idx = 0; key0Idx < key0Count; key0Idx++)
> +        {
> +            row[ keyCols[0]].setValue( key0Idx);
> +            for( int key1Idx = 0; key1Idx < key1Count; key1Idx++)
> +            {
> +                if( keyCols.length > 1)
> +                    row[ keyCols[1]].setValue( key1Idx);
> +                Object key = KeyHasher.buildHashKey( row, keyCols);
> +                if( ! rowsEqual( dht.get( key), row))
> +                {
> +                    REPORT_FAILURE( "  large table get returned wrong value for key(" + key0Idx + "," + key1Idx + ")");
> +                    key0Idx = key0Count;
> +                    break;
> +                }
> +            }
> +        }
> +        BitSet found = new BitSet(key0Count * key1Count);
> +        Enumeration elements = dht.elements();
> +        while( elements.hasMoreElements())
> +        {
> +            Object el = elements.nextElement();
> +            if( ! (el instanceof DataValueDescriptor[]))
> +            {
> +                REPORT_FAILURE( "  large table enumeration returned wrong element type");
> +                break;
> +            }
> +            DataValueDescriptor[] fetchedRow = (DataValueDescriptor[]) el;
> +            
> +            int i = fetchedRow[ keyCols[0]].getInt() * key1Count;
> +            if( keyCols.length > 1)
> +                i += fetchedRow[ keyCols[1]].getInt();
> +            if( i >= key0Count * key1Count)
> +            {
> +                REPORT_FAILURE( "  large table enumeration returned invalid element");
> +                break;
> +            }
> +                
> +            if( found.get(i))
> +            {
> +                REPORT_FAILURE( "  large table enumeration returned same element twice");
> +                break;
> +            }
> +            found.set(i);
> +        }
> +        for( int i = key0Count * key1Count - 1; i >= 0; i--)
> +        {
> +            if( !found.get(i))
> +            {
> +                REPORT_FAILURE( "  large table enumeration missed at least one element");
> +                break;
> +            }
> +        }
> +    } // end of testLargeTable
> +
> +    private void testElements( boolean removeDups,
> +                               DiskHashtable dht,
> +                               int[] keyCols,
> +                               int rowCount,
> +                               DataValueDescriptor[][] rows,
> +                               HashMap simpleHash,
> +                               boolean[] isDuplicate,
> +                               boolean[] found)
> +        throws StandardException
> +    {
> +        for( int i = 0; i < rowCount; i++)
> +            found[i] = false;
> +        
> +        for( Enumeration e = dht.elements(); e.hasMoreElements();)
> +        {
> +            Object el = e.nextElement();
> +            if( el == null)
> +            {
> +                REPORT_FAILURE( "  table enumeration returned a null element");
> +                return;
> +            }
> +            if( el instanceof DataValueDescriptor[])
> +                checkElement( (DataValueDescriptor[]) el, rowCount, rows, found);
> +            else if( el instanceof Vector)
> +            {
> +                Vector v = (Vector) el;
> +                for( int i = 0; i < v.size(); i++)
> +                    checkElement( (DataValueDescriptor[]) v.get(i), rowCount, rows, found);
> +            }
> +            else if( el == null)
> +            {
> +                REPORT_FAILURE( "  table enumeration returned an incorrect element type");
> +                return;
> +            }
> +        }
> +        for( int i = 0; i < rowCount; i++)
> +        {
> +            if( (removeDups && isDuplicate[i]))
> +            {
> +                if( found[i])
> +                {
> +                    REPORT_FAILURE( "  table enumeration did not remove duplicates");
> +                    return;
> +                }
> +            }
> +            else if( ! found[i])
> +            {
> +                REPORT_FAILURE( "  table enumeration missed at least one element");
> +                return;
> +            }
> +        }
> +    } // end of testElements
> +
> +    private void checkElement( DataValueDescriptor[] fetchedRow,
> +                               int rowCount,
> +                               DataValueDescriptor[][] rows,
> +                               boolean[] found)
> +        throws StandardException
> +    {
> +        for( int i = 0; i < rowCount; i++)
> +        {
> +            if( rowsEqual( fetchedRow, rows[i]))
> +            {
> +                if( found[i])
> +                {
> +                    REPORT_FAILURE( "  table enumeration returned the same element twice");
> +                    return;
> +                }
> +                found[i] = true;
> +                return;
> +            }
> +        }
> +        REPORT_FAILURE( "  table enumeration returned an incorrect element");
> +    } // end of checkElement
> +
> +    private boolean rowsEqual( Object r1, Object r2)
> +        throws StandardException
> +    {
> +        if( r1 == null)
> +            return r2 == null;
> +
> +        if( r1 instanceof DataValueDescriptor[])
> +        {
> +            DataValueDescriptor[] row1 = (DataValueDescriptor[]) r1;
> +            DataValueDescriptor[] row2;
> +            
> +            if( r2 instanceof Vector)
> +            {
> +                Vector v2 = (Vector) r2;
> +                if( v2.size() != 1)
> +                    return false;
> +                row2 = (DataValueDescriptor[]) v2.elementAt(0);
> +            }
> +            else if( r2 instanceof DataValueDescriptor[])
> +                row2 = (DataValueDescriptor[]) r2;
> +            else
> +                return false;
> +            
> +            if( row1.length != row2.length)
> +                return false;
> +            for( int i = 0; i < row1.length; i++)
> +            {
> +                if( ! row1[i].compare( Orderable.ORDER_OP_EQUALS, row2[i], true, true))
> +                    return false;
> +            }
> +            return true;
> +        }
> +        if( r1 instanceof Vector)
> +        {
> +            if( !(r2 instanceof Vector))
> +                return false;
> +            Vector v1 = (Vector) r1;
> +            Vector v2 = (Vector) r2;
> +            if( v1.size() != v2.size())
> +                return false;
> +            for( int i = v1.size() - 1; i >= 0; i--)
> +            {
> +                if( ! rowsEqual( v1.elementAt( i), v2.elementAt(i)))
> +                    return false;
> +            }
> +            return true;
> +        }
> +        // What is it then?
> +        return r1.equals( r2);
> +    } // end of rowsEqual
> +}
> Index: java/testing/org/apache/derbyTesting/functionTests/master/TestDiskHashtable.out
> ===================================================================
> --- java/testing/org/apache/derbyTesting/functionTests/master/TestDiskHashtable.out	(revision 0)
> +++ java/testing/org/apache/derbyTesting/functionTests/master/TestDiskHashtable.out	(revision 0)
> @@ -0,0 +1,6 @@
> +Test DiskHashtable starting
> +Starting single key, keep duplicates test
> +Starting single key, remove duplicates test
> +Starting multiple key, keep duplicates test
> +Starting multiple key, remove duplicates test
> +OK
> Index: java/testing/org/apache/derbyTesting/functionTests/master/SpillHash.out
> ===================================================================
> --- java/testing/org/apache/derbyTesting/functionTests/master/SpillHash.out	(revision 0)
> +++ java/testing/org/apache/derbyTesting/functionTests/master/SpillHash.out	(revision 0)
> @@ -0,0 +1,10 @@
> +Running join
> +Running distinct
> +Running outer join 1
> +Running scroll insensitive cursor
> +Growing database.
> +Running join
> +Running distinct
> +Running outer join 1
> +Running scroll insensitive cursor
> +PASSED.
> Index: java/testing/org/apache/derbyTesting/functionTests/suites/derbylang.runall
> ===================================================================
> --- java/testing/org/apache/derbyTesting/functionTests/suites/derbylang.runall	(revision 155029)
> +++ java/testing/org/apache/derbyTesting/functionTests/suites/derbylang.runall	(working copy)
> @@ -107,6 +107,7 @@
>  lang/select.sql
>  lang/simpleThreadWrapper.java
>  lang/specjPlans.sql
> +lang/SpillHash.java
>  lang/staleplans.sql
>  lang/stmtCache0.sql
>  lang/stmtCache1.sql
> Index: build.xml
> ===================================================================
> --- build.xml	(revision 155029)
> +++ build.xml	(working copy)
> @@ -413,6 +413,7 @@
>        <arg value="java.math.BigDecimal"/>
>        <arg value="java.util.ArrayList"/>
>        <arg value="java.util.GregorianCalendar"/>
> +      <arg value="java.util.Vector"/>
>      </java>
>  
>      <javac


Mime
View raw message