db-derby-dev mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From mike matrigali <mikema...@gmail.com>
Subject Re: Store api question: how to ask for RowLocations
Date Tue, 15 Oct 2013 23:08:12 GMT
ok, thanks for head up.  will look at it later this week.

/mikem

On 10/15/2013 5:10 AM, Rick Hillegas wrote:
> Hi Mike,
>
> There's a revised patch,
> derby-3155-03-af-backingStoreHashtableWithRowLocation.diff, waiting for
> your review when you have the cycles.
>
> Thanks,
> -Rick
>
> On 10/4/13 11:37 AM, Rick Hillegas wrote:
>> Hi Mike,
>>
>> I have attached a patch to DERBY-3155 which introduces
>> BackingStoreHashtables which include RowLocation information:
>> derby-3155-03-ae-backingStoreHashtableWithRowLocation.diff. I would
>> appreciate your feedback.
>>
>> Thanks,
>> -Rick
>>
>> On 10/2/13 6:34 AM, Rick Hillegas wrote:
>>> Thanks for the quick response, Mike. Some more discussion inline...
>>>
>>> On 10/1/13 3:20 PM, mike matrigali wrote:
>>>> I don't have any good answers here, but maybe some places to look - and
>>>> some questions.
>>>>
>>>> Are you going to ever need RowLocations of rows in an index?  If so
>>>> this
>>>> is going to be very new territory and Derby has never done that. For
>>>> a btree the RowLocation would be just the actual row as the location is
>>>> definined by the key - there is no other quick way given regular row
>>>> level locking as the row is free to move from page to page and slot to
>>>> slot.  The abstraction of RowLocation was designed to handle this as
>>>> we wanted to be able to support a btree base table if necessary, but
>>>> no actual implementation was ever done.
>>> The MERGE statement shouldn't need the RowLocations of index rows.
>>> MERGE is only interested in the base rows.
>>>>
>>>> Usually when you see a reference in the code about a RowLocation being
>>>> at column "N +1" it is usually an index where the code assumes the
>>>> RowLocation at the end of the row is the Rowlocation of the
>>>> associated row in the heap.  So might
>>>> be confusing if what you are looking for is the RowLocation of the
>>>> current row.  In the case of indexes this row location is actually
>>>> stored as the N+1 column so makes sense returning it in the row.
>>> Thanks. I can see that avoiding that pattern will reduce confusion.
>>>>
>>>> You might look at current interfaces that use the
>>>> RowLocationRetRowSource.  I don't think any of these solve your
>>>> current problem but may give insight in how it was handled in the
>>>> past.  This looks like at least one approach in the
>>>> past to allow caller access to RowLocations from bulk type scans.  I
>>>> think it is mostly used currently to scan table once and then build
>>>> indexes.  In this case it is left up to the caller to maintain the
>>>> separate information about each row.
>>> Thanks, I'll take a look at that.
>>>>
>>>> Is there some write up on the algo needed for merge so that I could
>>>> understand the requirements of the interface.   I have not read up
>>>> on this project so if it is already documented just point me there.
>>> The issue is DERBY-3155. There's a functional spec attached to that
>>> issue. The implementation is evolving as I feel my way forward. A
>>> high level description of the approach I'm trying right now is
>>> described in a 2013-08-20 comment on that issue. In a nutshell, this
>>> is it:
>>>
>>> o First run a left join to determine the list of rows which need to
>>> be touched.
>>> o As the left join is processed, figure out which (if any) MERGE
>>> action applies to each row. Each MERGE action will have its own
>>> temporary table for buffering these rows.
>>> o Then use the temporary tables to drive the corresponding MERGE
>>> actions.
>>>
>>> The RowLocations are needed for the DELETE and UPDATE actions.
>>>>
>>>> At the interface level a key question is if the generic openScan
>>>> interface needs to change, once that happens a lot of the other
>>>> interfaces need to change also.  There are a lot of interfaces that
>>>> were added for better scan performance for a specific need so maybe
>>>> this is just another one.
>>>>
>>>> A clean interface that comes to mind would be to create a new class for
>>>> row return that is more than just Object[].  In this case it is likely
>>>> 2 fields: Object[] and RowLocation.  Then probably a new type of
>>>> create hash table that create one loaded with these new types of rows.
>>>> And then alter the interfaces to build
>>>> this extra overhead if necessary.  I like this approach rather than
>>>> adding the "fake" filed onto the end of the row as it avoids bugs
>>>> that incorrectly treat the field as a real field for such things as
>>>> hashing, sorting, duplicate key determination, ...
>>> Thanks, I like that approach.
>>>>
>>>> It is my understanding that hash tables are one of the key
>>>> performance features of the system
>>>> currently so would be nice to not add overhead to the main line path
>>>> for this feature if possible.
>>> Agreed. That has been my approach so far.
>>>
>>> Thanks,
>>> -Rick
>>>>
>>>> On 10/1/2013 10:29 AM, Rick Hillegas wrote:
>>>>> I need some advice about how to design an api for requesting that the
>>>>> Store include RowLocations in the rows that it scans and hands back to
>>>>> the language layer.
>>>>>
>>>>> The immediate problem that I'm working on involves implementing the
>>>>> MERGE statement (DERBY-3155). Part of the implementation involves
>>>>> cooking up a left join between two tables. I need to get back
>>>>> RowLocations for the right table of that join. In a particular problem
>>>>> case which I'm examining, the optimizer picks a HashJoin strategy for
>>>>> the left join. That turns into a HashLeftOuterJoinResultSet at
>>>>> execution
>>>>> time. And that, in turn, involves having the Store create and fill a
>>>>> BackingStoreHashTableFromScan.
>>>>>
>>>>> The BackingStoreHashTableFromScan is created with a scanColumnList (a
>>>>> FormatableBitSet) which specifies some actual columns in the row as
>>>>> well
>>>>> as a trailing column position which is meant to represent the
>>>>> RowLocation. That trailing column position is represented as 1 plus
>>>>> the
>>>>> actual row length. BackingStoreHashTableFromScan doesn't know what to
>>>>> make of that column position and silently ignores it. So clearly
>>>>> either
>>>>> that's the wrong api for asking for RowLocations or
>>>>> BackingStoreHashTableFromScan needs to be taught some new tricks.
>>>>>
>>>>> So the question is this: what's the right way to ask
>>>>> BackingStoreHashTableFromScan to build a hash table whose rows contain
>>>>> some set of real column positions plus a trailing RowLocation
>>>>> column? I
>>>>> may stumble into other situations where I need to ask a scan to put
>>>>> RowLocations into the rows it returns. So it would be good to have a
>>>>> general pattern here for requesting this special column.
>>>>>
>>>>> Thanks,
>>>>> -Rick
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>>
>
>


Mime
View raw message