db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mike matrigali <mikema...@gmail.com>
Subject Re: Store api question: how to ask for RowLocations
Date Tue, 01 Oct 2013 22:20:13 GMT
I don't have any good answers here, but maybe some places to look - and
some questions.

Are you going to ever need RowLocations of rows in an index?  If so this
is going to be very new territory and Derby has never done that.  For a 
btree the RowLocation would be just the actual row as the location is
definined by the key - there is no other quick way given regular row
level locking as the row is free to move from page to page and slot to
slot.  The abstraction of RowLocation was designed to handle this as
we wanted to be able to support a btree base table if necessary, but
no actual implementation was ever done.

Usually when you see a reference in the code about a RowLocation being
at column "N +1" it is usually an index where the code assumes the
RowLocation at the end of the row is the Rowlocation of the associated 
row in the heap.  So might
be confusing if what you are looking for is the RowLocation of the
current row.  In the case of indexes this row location is actually
stored as the N+1 column so makes sense returning it in the row.

You might look at current interfaces that use the 
RowLocationRetRowSource.  I don't think any of these solve your current 
problem but may give insight in how it was handled in the past.  This 
looks like at least one approach in the
past to allow caller access to RowLocations from bulk type scans.  I 
think it is mostly used currently to scan table once and then build 
indexes.  In this case it is left up to the caller to maintain the
separate information about each row.

Is there some write up on the algo needed for merge so that I could
understand the requirements of the interface.   I have not read up
on this project so if it is already documented just point me there.

At the interface level a key question is if the generic openScan 
interface needs to change, once that happens a lot of the other 
interfaces need to change also.  There are a lot of interfaces that
were added for better scan performance for a specific need so maybe
this is just another one.

A clean interface that comes to mind would be to create a new class for
row return that is more than just Object[].  In this case it is likely
2 fields: Object[] and RowLocation.  Then probably a new type of create 
hash table that create one loaded with these new types of rows.
And then alter the interfaces to build
this extra overhead if necessary.  I like this approach rather than 
adding the "fake" filed onto the end of the row as it avoids bugs
that incorrectly treat the field as a real field for such things as
hashing, sorting, duplicate key determination, ...

It is my understanding that hash tables are one of the key performance 
features of the system
currently so would be nice to not add overhead to the main line path
for this feature if possible.

On 10/1/2013 10:29 AM, Rick Hillegas wrote:
> I need some advice about how to design an api for requesting that the
> Store include RowLocations in the rows that it scans and hands back to
> the language layer.
> The immediate problem that I'm working on involves implementing the
> MERGE statement (DERBY-3155). Part of the implementation involves
> cooking up a left join between two tables. I need to get back
> RowLocations for the right table of that join. In a particular problem
> case which I'm examining, the optimizer picks a HashJoin strategy for
> the left join. That turns into a HashLeftOuterJoinResultSet at execution
> time. And that, in turn, involves having the Store create and fill a
> BackingStoreHashTableFromScan.
> The BackingStoreHashTableFromScan is created with a scanColumnList (a
> FormatableBitSet) which specifies some actual columns in the row as well
> as a trailing column position which is meant to represent the
> RowLocation. That trailing column position is represented as 1 plus the
> actual row length. BackingStoreHashTableFromScan doesn't know what to
> make of that column position and silently ignores it. So clearly either
> that's the wrong api for asking for RowLocations or
> BackingStoreHashTableFromScan needs to be taught some new tricks.
> So the question is this: what's the right way to ask
> BackingStoreHashTableFromScan to build a hash table whose rows contain
> some set of real column positions plus a trailing RowLocation column? I
> may stumble into other situations where I need to ask a scan to put
> RowLocations into the rows it returns. So it would be good to have a
> general pattern here for requesting this special column.
> Thanks,
> -Rick

View raw message