db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Vathsala Weerasinghe <weerasinghe.vaths...@gmail.com>
Subject Re: Derby hash join performance improvement
Date Sat, 17 Aug 2013 02:25:22 GMT
Hi Rick,

Thank you very much for the quick response. Yes, we are interested in
improving the execution-time performance of hash joins. Will look into
the class you suggested. This will really help us to get started.

Thank you very much again for the valuable information.


On 16 August 2013 23:20, Rick Hillegas <rick.hillegas@oracle.com> wrote:
> On 8/16/13 8:34 AM, Vathsala Weerasinghe wrote:
>> Hi,
>> I'm Vathsala Weerasinghe, a final year undergraduate of the Department
>> of Computer Science and Engineering,  University of Moratuwa, Sri
>> Lanka.
>> Currently we are looking into the derby source to improve the hash
>> join performance as a group project.
>> We are trying to identify the implementation of the hash join in
>> Derby. Now we are looking into HashJoinStrategy class in the
>> org.apache.derby.impl.sql.compile package.
>> Can someone guide us towards the correct path or provide us some good
>> resources which will help us in tackling this problem?
>> Thanks in advance.
> Hi Vathsala,
> JoinStrategy is an interface which represents the two approaches which the
> optimizer may pick for joining tables. There are 2 implementations of
> JoinStrategy: NestedLoopJoinStrategy and HashJoinStrategy. The
> JoinStrategies are purely compile-time structures and they disappear at
> execution time.
> You may be interested in changing the Derby cost model so that the optimizer
> picks a HashJoinStrategy more or less often. However, I suspect that you are
> really interested in improving the execution-time performance of hash joins.
> If that is the case, then you will want to look at the following class:
> HashScanResultSet - This is the right child of the join when performing a
> hash join. At initialization time, this node reads the right table of the
> join and builds a hash map, mapping keys to full rows. A join node sits
> above this HashScanResultSet. The join node reads rows from its driving,
> left child. For each left row, the join node asks the HashScanResultSet to
> match that row's key to all matching rows in the right table. The
> HashScanResultSet uses the key to probe for matches in the hash map which it
> built at initialization time.
> Hope this helps,
> -Rick

Vathsala Weerasinghe

View raw message