db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <rick.hille...@oracle.com>
Subject Re: Store api question: how to ask for RowLocations
Date Tue, 15 Oct 2013 12:10:18 GMT
Hi Mike,

There's a revised patch, 
derby-3155-03-af-backingStoreHashtableWithRowLocation.diff, waiting for 
your review when you have the cycles.


On 10/4/13 11:37 AM, Rick Hillegas wrote:
> Hi Mike,
> I have attached a patch to DERBY-3155 which introduces 
> BackingStoreHashtables which include RowLocation information: 
> derby-3155-03-ae-backingStoreHashtableWithRowLocation.diff. I would 
> appreciate your feedback.
> Thanks,
> -Rick
> On 10/2/13 6:34 AM, Rick Hillegas wrote:
>> Thanks for the quick response, Mike. Some more discussion inline...
>> On 10/1/13 3:20 PM, mike matrigali wrote:
>>> I don't have any good answers here, but maybe some places to look - and
>>> some questions.
>>> Are you going to ever need RowLocations of rows in an index?  If so 
>>> this
>>> is going to be very new territory and Derby has never done that.  
>>> For a btree the RowLocation would be just the actual row as the 
>>> location is
>>> definined by the key - there is no other quick way given regular row
>>> level locking as the row is free to move from page to page and slot to
>>> slot.  The abstraction of RowLocation was designed to handle this as
>>> we wanted to be able to support a btree base table if necessary, but
>>> no actual implementation was ever done.
>> The MERGE statement shouldn't need the RowLocations of index rows. 
>> MERGE is only interested in the base rows.
>>> Usually when you see a reference in the code about a RowLocation being
>>> at column "N +1" it is usually an index where the code assumes the
>>> RowLocation at the end of the row is the Rowlocation of the 
>>> associated row in the heap.  So might
>>> be confusing if what you are looking for is the RowLocation of the
>>> current row.  In the case of indexes this row location is actually
>>> stored as the N+1 column so makes sense returning it in the row.
>> Thanks. I can see that avoiding that pattern will reduce confusion.
>>> You might look at current interfaces that use the 
>>> RowLocationRetRowSource.  I don't think any of these solve your 
>>> current problem but may give insight in how it was handled in the 
>>> past.  This looks like at least one approach in the
>>> past to allow caller access to RowLocations from bulk type scans.  I 
>>> think it is mostly used currently to scan table once and then build 
>>> indexes.  In this case it is left up to the caller to maintain the
>>> separate information about each row.
>> Thanks, I'll take a look at that.
>>> Is there some write up on the algo needed for merge so that I could
>>> understand the requirements of the interface.   I have not read up
>>> on this project so if it is already documented just point me there.
>> The issue is DERBY-3155. There's a functional spec attached to that 
>> issue. The implementation is evolving as I feel my way forward. A 
>> high level description of the approach I'm trying right now is 
>> described in a 2013-08-20 comment on that issue. In a nutshell, this 
>> is it:
>> o First run a left join to determine the list of rows which need to 
>> be touched.
>> o As the left join is processed, figure out which (if any) MERGE 
>> action applies to each row. Each MERGE action will have its own 
>> temporary table for buffering these rows.
>> o Then use the temporary tables to drive the corresponding MERGE 
>> actions.
>> The RowLocations are needed for the DELETE and UPDATE actions.
>>> At the interface level a key question is if the generic openScan 
>>> interface needs to change, once that happens a lot of the other 
>>> interfaces need to change also.  There are a lot of interfaces that
>>> were added for better scan performance for a specific need so maybe
>>> this is just another one.
>>> A clean interface that comes to mind would be to create a new class for
>>> row return that is more than just Object[].  In this case it is likely
>>> 2 fields: Object[] and RowLocation.  Then probably a new type of 
>>> create hash table that create one loaded with these new types of rows.
>>> And then alter the interfaces to build
>>> this extra overhead if necessary.  I like this approach rather than 
>>> adding the "fake" filed onto the end of the row as it avoids bugs
>>> that incorrectly treat the field as a real field for such things as
>>> hashing, sorting, duplicate key determination, ...
>> Thanks, I like that approach.
>>> It is my understanding that hash tables are one of the key 
>>> performance features of the system
>>> currently so would be nice to not add overhead to the main line path
>>> for this feature if possible.
>> Agreed. That has been my approach so far.
>> Thanks,
>> -Rick
>>> On 10/1/2013 10:29 AM, Rick Hillegas wrote:
>>>> I need some advice about how to design an api for requesting that the
>>>> Store include RowLocations in the rows that it scans and hands back to
>>>> the language layer.
>>>> The immediate problem that I'm working on involves implementing the
>>>> MERGE statement (DERBY-3155). Part of the implementation involves
>>>> cooking up a left join between two tables. I need to get back
>>>> RowLocations for the right table of that join. In a particular problem
>>>> case which I'm examining, the optimizer picks a HashJoin strategy for
>>>> the left join. That turns into a HashLeftOuterJoinResultSet at 
>>>> execution
>>>> time. And that, in turn, involves having the Store create and fill a
>>>> BackingStoreHashTableFromScan.
>>>> The BackingStoreHashTableFromScan is created with a scanColumnList (a
>>>> FormatableBitSet) which specifies some actual columns in the row as 
>>>> well
>>>> as a trailing column position which is meant to represent the
>>>> RowLocation. That trailing column position is represented as 1 plus 
>>>> the
>>>> actual row length. BackingStoreHashTableFromScan doesn't know what to
>>>> make of that column position and silently ignores it. So clearly 
>>>> either
>>>> that's the wrong api for asking for RowLocations or
>>>> BackingStoreHashTableFromScan needs to be taught some new tricks.
>>>> So the question is this: what's the right way to ask
>>>> BackingStoreHashTableFromScan to build a hash table whose rows contain
>>>> some set of real column positions plus a trailing RowLocation 
>>>> column? I
>>>> may stumble into other situations where I need to ask a scan to put
>>>> RowLocations into the rows it returns. So it would be good to have a
>>>> general pattern here for requesting this special column.
>>>> Thanks,
>>>> -Rick

View raw message