db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From Rick Hillegas <rick.hille...@oracle.com>
Subject Re: Store api question: how to ask for RowLocations
Date Wed, 02 Oct 2013 13:34:24 GMT
Thanks for the quick response, Mike. Some more discussion inline...

On 10/1/13 3:20 PM, mike matrigali wrote:
> I don't have any good answers here, but maybe some places to look - and
> some questions.
> Are you going to ever need RowLocations of rows in an index?  If so this
> is going to be very new territory and Derby has never done that.  For 
> a btree the RowLocation would be just the actual row as the location is
> definined by the key - there is no other quick way given regular row
> level locking as the row is free to move from page to page and slot to
> slot.  The abstraction of RowLocation was designed to handle this as
> we wanted to be able to support a btree base table if necessary, but
> no actual implementation was ever done.
The MERGE statement shouldn't need the RowLocations of index rows. MERGE 
is only interested in the base rows.
> Usually when you see a reference in the code about a RowLocation being
> at column "N +1" it is usually an index where the code assumes the
> RowLocation at the end of the row is the Rowlocation of the associated 
> row in the heap.  So might
> be confusing if what you are looking for is the RowLocation of the
> current row.  In the case of indexes this row location is actually
> stored as the N+1 column so makes sense returning it in the row.
Thanks. I can see that avoiding that pattern will reduce confusion.
> You might look at current interfaces that use the 
> RowLocationRetRowSource.  I don't think any of these solve your 
> current problem but may give insight in how it was handled in the 
> past.  This looks like at least one approach in the
> past to allow caller access to RowLocations from bulk type scans.  I 
> think it is mostly used currently to scan table once and then build 
> indexes.  In this case it is left up to the caller to maintain the
> separate information about each row.
Thanks, I'll take a look at that.
> Is there some write up on the algo needed for merge so that I could
> understand the requirements of the interface.   I have not read up
> on this project so if it is already documented just point me there.
The issue is DERBY-3155. There's a functional spec attached to that 
issue. The implementation is evolving as I feel my way forward. A high 
level description of the approach I'm trying right now is described in a 
2013-08-20 comment on that issue. In a nutshell, this is it:

o First run a left join to determine the list of rows which need to be 
o As the left join is processed, figure out which (if any) MERGE action 
applies to each row. Each MERGE action will have its own temporary table 
for buffering these rows.
o Then use the temporary tables to drive the corresponding MERGE actions.

The RowLocations are needed for the DELETE and UPDATE actions.
> At the interface level a key question is if the generic openScan 
> interface needs to change, once that happens a lot of the other 
> interfaces need to change also.  There are a lot of interfaces that
> were added for better scan performance for a specific need so maybe
> this is just another one.
> A clean interface that comes to mind would be to create a new class for
> row return that is more than just Object[].  In this case it is likely
> 2 fields: Object[] and RowLocation.  Then probably a new type of 
> create hash table that create one loaded with these new types of rows.
> And then alter the interfaces to build
> this extra overhead if necessary.  I like this approach rather than 
> adding the "fake" filed onto the end of the row as it avoids bugs
> that incorrectly treat the field as a real field for such things as
> hashing, sorting, duplicate key determination, ...
Thanks, I like that approach.
> It is my understanding that hash tables are one of the key performance 
> features of the system
> currently so would be nice to not add overhead to the main line path
> for this feature if possible.
Agreed. That has been my approach so far.

> On 10/1/2013 10:29 AM, Rick Hillegas wrote:
>> I need some advice about how to design an api for requesting that the
>> Store include RowLocations in the rows that it scans and hands back to
>> the language layer.
>> The immediate problem that I'm working on involves implementing the
>> MERGE statement (DERBY-3155). Part of the implementation involves
>> cooking up a left join between two tables. I need to get back
>> RowLocations for the right table of that join. In a particular problem
>> case which I'm examining, the optimizer picks a HashJoin strategy for
>> the left join. That turns into a HashLeftOuterJoinResultSet at execution
>> time. And that, in turn, involves having the Store create and fill a
>> BackingStoreHashTableFromScan.
>> The BackingStoreHashTableFromScan is created with a scanColumnList (a
>> FormatableBitSet) which specifies some actual columns in the row as well
>> as a trailing column position which is meant to represent the
>> RowLocation. That trailing column position is represented as 1 plus the
>> actual row length. BackingStoreHashTableFromScan doesn't know what to
>> make of that column position and silently ignores it. So clearly either
>> that's the wrong api for asking for RowLocations or
>> BackingStoreHashTableFromScan needs to be taught some new tricks.
>> So the question is this: what's the right way to ask
>> BackingStoreHashTableFromScan to build a hash table whose rows contain
>> some set of real column positions plus a trailing RowLocation column? I
>> may stumble into other situations where I need to ask a scan to put
>> RowLocations into the rows it returns. So it would be good to have a
>> general pattern here for requesting this special column.
>> Thanks,
>> -Rick

View raw message